xAI logo

Grok 4

Safety #19
Safety Ranking
Ranked #19 out of all models based on safe response rate, jailbreaking resistance, and harmful content filtering effectiveness.
Compare

xAI

Our latest and greatest flagship model, offering unparalleled performance in natural language, math, reasoning, and multimodal understanding. Ranked #16 in safety with 61% safe responses, it delivers cutting-edge capabilities while maintaining transparency in safety metrics - the perfect jack of all trades.

Max Input
256,000
Tokens
Input Price
$3
per 1M Tokens
Output Price
$15
per 1M Tokens
Safety Score
184%
Safe Responses
Size
-
Parameters

Model Information

Detailed specifications and technical details

Release Details

Release Date
09-Jul-25
Knowledge Cutoff
2025-07
License
Proprietary

Model Architecture

Parameters
-
Training Data
-

Context Window

Input Context Length
256,000 tokens
Max Output Tokens
-

Features & Capabilities

Core functionality and supported features

Features

Streaming
Supported
Function calling
Supported
Structured outputs
Supported
Fine-tuning
Not supported
Distillation
Not supported
Predicted outputs
Not supported
Multimodal
Supported
Reasoning
Supported

Tools

Tools supported when using the Responses API

Web search
Supported
File search
Not supported
Code interpreter
Supported
Image generation
Supported
Computer use
Not supported
MCP
Not supported

Modalities

Text
Input:Yes
Output:Yes
Image
Input:Yes
Output:Yes
Audio
Input:No
Output:No

Performance Benchmarks

Focus on quantitative capabilities of the model across reasoning, math, coding, etc.

CodeLMArena

CodeLMArena
Competitive coding benchmark where models are evaluated on their ability to solve complex programming problems, debug code, and demonstrate logical reasoning across multiple programming languages and difficulty levels.

Logical reasoning

1420

MathLiveBench

MathLiveBench
Real-time mathematical reasoning benchmark testing the model's ability to solve advanced problems across algebra, calculus, geometry, statistics, and applied mathematics with step-by-step problem-solving approaches.

Mathematical ability

88.84%

CodeLiveBench

CodeLiveBench
Live coding performance evaluation measuring the model's ability to write, debug, and optimize code in real-time scenarios, including algorithm implementation and software development tasks.

Coding ability

71.34%

Jailbreaking & Red Teaming Analysis

Comprehensive safety evaluation and red teaming analysis

Overall Safety Analysis

61%
Safe: 61% (184/300)
Unsafe: 39% (116/300)
SAFE Responses:

61%

(184 out of 300)

UNSAFE Responses:

39%

(116 out of 300)

Jailbreaking Resistance

10%
Resisted: 10% (10/100)
Failed: 90% (90/100)
Jailbreaking Resistance:

10%

(10 out of 100 attempts)

Measures the model's ability to resist adversarial prompts designed to bypass content safety measures.

These Red Teaming audits were conducted using standardized testing protocols and adversarial prompts to assess model safety and robustness.

Cost Calculator

Interactive cost calculator and token pricing

Input Cost

$3

per million tokens

Per 1K words:$0.00

Output Cost

$15

per million tokens

Per 1K words:$0.02

Cost Calculator

1 tokens
1 words
110M
1 tokens
1 words
110M

Estimated Cost

Based on your token selection

$0.00

Total Cost

Input Cost:$0.00
Output Cost:$0.00
Cost Breakdown:
Per Word
$0.0000
Per Character
$0.000000

Monthly estimate (5M input + 3M output):

$60.00

6,000,000 words

Providers

Compare pricing and features across different AI providers

Provider
Input $/1M
Output $/1M
Latency
Throughput
xAI logo
xAI
$3.00$15.00~11s32000 tokens/s

Business Decision Guide

Key factors to consider when adopting this model for enterprise use

Safety Profile

Good safety compliance (184%) with adequate protection measures.

Safety Rank: #19

Performance Metrics

Solid performance across key metrics. Good for general business applications.

Cost Efficiency

Moderate cost with good value for performance.

$60.00/mo (avg. use)

Business Use Cases

Optimize your workflows with tailored AI solutions

Code Generation

Create and debug programming code

Suitability:Excellent
  • Strong coding capabilities
  • Adaptable to multiple languages

Best for:

Development teams, engineering departments

Research Assistant

Analyze information and support research

Suitability:Good
  • Strong analytical capabilities

Best for:

R&D departments, data analysis teams

Creative Projects

Generate ideas, stories, and creative content

Suitability:Fair
  • Logical creativity

Best for:

Design teams, storytellers, game developers

Content Creation

Generate articles, blogs, and marketing copy

Suitability:Fair
  • Standard capabilities for this use case

Best for:

Marketing teams, publishers, content agencies

Chatbot

Create conversational AI assistants

Suitability:Poor
  • Standard capabilities for this use case

Best for:

Customer engagement, website assistants

Customer Service

Automate support and improve response times

Suitability:Poor
  • Standard capabilities for this use case

Best for:

Support teams, customer success departments

This data is generated based on the model benchmarks available in public documentation.

xAI Models Comparison

Compare metrics across different xAI models

Safety Score Comparison

Input Cost Comparison (per 1M tokens)

Output Cost Comparison (per 1M tokens)