Filter by Provider:
Last updated on: 6th Oct 2025
Use horizontal scroll, buttons, or Ctrl+Arrow keys
Scroll Progress:
0%
Swipe horizontally to scroll
Model
Org.
Safety
Rank
Safe
Responses
Unsafe
Responses
Jailbreaking
Resistance
Code
LMArena (old)
Math
LiveBench
GPQA
Code
LiveBench
Code
RankedAGI
Multimodal
Support
Available on
Google Vertex AI
Available on
Microsoft Azure
Available on
AWS Bedrock
Size
Parameters
Released
Knowledge
Cutoff
Input
Cost/M
Output
Cost/M
Context
Length
Latency
TTFT
Moonshot AI logo
Kimi K2 Instruct 0905
Moonshot AI#1581%
19%
(58 / 300)
42%
(42 / 100)
--0.76/1--
Yes
No
No
No
32B (1T total)05-Sep-242024-09$0.60$2.50256K tokens-
xAI logo
Grok 4
xAI#2261%
39%
(116 / 300)
10%
(10 / 100)
142088.84%88.10%71.34%-
Yes
No
No
No
-09-Jul-252025-07$3.00$15.00256K tokens~11s
OpenAI logo
GPT-4.1
OpenAI#993%
7%
(22 / 300)
83%
(83 / 100)
138572.00%66.30%54.60%50.3%
Yes
No
Yes
Yes
-14-Apr-252024-06-01$2.00$8.001M tokens~0.4s
OpenAI logo
GPT-4.1 Mini
OpenAI#796%
4%
(12 / 300)
89%
(89 / 100)
136545.2%65.0%42.8%35.5%
Yes
No
Yes
Yes
-14-Apr-252025-01$0.40$1.601M tokens~0.43s
OpenAI logo
GPT-4.5
OpenAI#3100%
0%
(1 / 237)
97%
(36 / 37)
136269.30%69.50%76.10%58.6%
Yes
No
No
No
-27-Feb-252023-10-01$75.00$150.00128K tokens~0.5s
OpenAI logo
GPT-5
OpenAI#1191%
9%
(26 / 300)
75%
(75 / 100)
145092.77%86.00%75.31%39.8%
Yes
No
Yes
No
-07-Aug-252024-10-01$1.25$10.00400K tokensMedium
Anthropic logo
Claude Opus 4.1
Anthropic#599%
1%
(4 / 300)
97%
(97 / 100)
-90.0%83.3%74.5%74.5%
Yes
Yes
No
Yes
-05-Aug-252025-03-01$15.00$75.00200K tokensModerately Fast
OpenAI logo
GPT-4o
OpenAI#1093%
7%
(20 / 300)
82%
(82 / 100)
138541.48%46.00%77.50%61.4%
Yes
No
Yes
No
-20-Nov-242023-10-01$2.50$10.00128K tokens~0.6s
Anthropic logo
Claude 3.7 Sonnet
Anthropic#2100%
0%
(1 / 300)
99%
(99 / 100)
132663.30%84.80%73.2%60.4%
Yes
Yes
No
Yes
70B
24-Feb-252024-10-01$3.00$15.00200K tokens-
DeepSeek logo
DeepSeek R1
DeepSeek#1289%
11%
(26 / 237)
32%
(12 / 37)
138079.50%74.24%48.50%-
No
No
No
No
671B
20-Jan-252024-07-01$0.55$2.19128K tokens-
Google logo
Gemini 2.5 Pro Preview
Google#1966%
34%
(102 / 300)
26%
(26 / 100)
139584.19%86.40%73.90%-
Yes
Yes
No
No
-05-Jun-252025-01-31$1.25$10.001M tokens-
Google logo
Gemini 2.5 Flash Preview 05-20
Google#2163%
37%
(111 / 300)
12%
(12 / 100)
135084.10%82.80%63.53%-
Yes
Yes
No
No
-17-Apr-252025-01-31$0.15$0.601M tokens-
Google logo
Gemini 2.0 Flash
Google#2066%
34%
(102 / 300)
10%
(10 / 100)
131083.33%62.10%70.70%-
Yes
Yes
No
No
-05-Feb-252024-08-01$0.10$0.401M tokens-
Google logo
Gemma 2 9B
Google#1484%
16%
(37 / 237)
5%
(2 / 37)
118052.27%-48.94%-
No
-
-
-
9B
27-Jun-242024-04-01--8K tokens-
xAI logo
Grok-3
xAI#2322%
78%
(235 / 300)
3%
(3 / 100)
132062.75%84.60%73.58%-
Yes
-
-
-
300B
01-May-242024-11$3.00$15.00128K tokens-
Meta logo
Llama 4 Maverick 128e Instruct
Meta#1779%
21%
(50 / 237)
3%
(1 / 37)
129060.58%-54.19%-
No
-
-
-
17B
05-Apr-252024-08-01$0.20$0.60128K tokens-
Meta logo
Llama 3.1 Instant
Meta#1679%
21%
(49 / 237)
3%
(1 / 37)
120051.88%-57.26%-
No
-
-
-
8B
23-Jul-242023-12-01--128K tokens-
Meta logo
Llama 4 Scout 16e Instruct
Meta#1878%
22%
(52 / 237)
3%
(1 / 37)
125048.14%-42.16%-
No
-
-
-
17B
05-Apr-252024-08-01--128K tokens-
Alibaba Cloud logo
QWen-qwq-32b
Alibaba Cloud#1387%
13%
(31 / 237)
32%
(12 / 37)
134081.14%-67.18%-
No
-
-
-
32B
12-Oct-242024-09-01--128K tokens-
OpenAI logo
GPT-OSS-20B
OpenAI#896%
4%
(11 / 300)
89%
(89 / 100)
-69.54%71.5%-24.8%
No
No
Yes
Yes
20B
05-Aug-25---128K tokens-
OpenAI logo
GPT-OSS-120B
OpenAI#697%
3%
(8 / 300)
92%
(92 / 100)
-69.54%80.1%58.80%35.5%
No
No
Yes
Yes
120B
05-Aug-25---128K tokens-
Anthropic logo
Claude Sonnet 4.5
Anthropic#1100%
0%
(1 / 300)
100%
(100 / 100)
142075.0%85.0%77.2%82.0%
Yes
-
-
-
-29-Sep-252025-04-01$3.00$15.00200K tokens~0.5s
Anthropic logo
Claude 4 Sonnet
Anthropic#499%
1%
(4 / 300)
97%
(97 / 100)
141070.5%75.40%64.8%78.3%
Yes
Yes
No
Yes
-22-May-252024-04-01$3.00$15.00200K tokens~0.5-1s

📊 Data Source

All comparative insights are based on a combination of rigorous red teaming and jailbreaking testing performed by Holistic AI, as well as publicly available benchmark data. External benchmarks include CodeLMArena, MathLiveBench, CodeLiveBench, and GPQA. These were sourced from official model provider websites, public leaderboards, benchmark sites, and other accessible resources to ensure transparency, accuracy, and reliability.