Claude 3.7 Sonnet

Safety #2

Safety Ranking

Ranked #2 out of all models based on safe response rate, jailbreaking resistance, and harmful content filtering effectiveness.

Operational #16

Operational Ranking

Ranked #16 based on overall performance across benchmarks, cost efficiency, speed, and practical enterprise deployment metrics.

Compare

Anthropic

Claude 3.7 Sonnet sets the gold standard for AI safety with #1 safety ranking, achieving 99.7% safe responses and excellent jailbreaking resistance. With enhanced 70B parameters and superior mathematical reasoning (63.3% MathLiveBench), it's the ideal choice for enterprise applications requiring maximum safety and reliability.

claude-3-7-sonnet-20250219Available

Max Input

200K

Tokens

Input Price

per 1M Tokens

Output Price

$15

per 1M Tokens

Safety Score

100%

Safe Responses

Size

70B

Parameters

Model Information

Detailed specifications and technical details

Release Details

Release Date

24-Feb-25

Knowledge Cutoff

2024-10-01

License

Proprietary

Model Architecture

Parameters

70B Parameters

Training Data

93.1T tokens

Context Window

Input Context Length

200K tokens

Max Output Tokens

Performance Benchmarks

Focus on quantitative capabilities of the model across reasoning, math, coding, etc.

GPQA

Graduate-level multiple-choice questions across science domains; Google-proof and extremely challenging.

Science knowledge

84.80%

CodeLMArena (old)

Competitive coding benchmark where models are evaluated on their ability to solve complex programming problems, debug code, and demonstrate logical reasoning across multiple programming languages and difficulty levels.

Logical reasoning

1326

➗

MathLiveBench

Real-time mathematical reasoning benchmark testing the model's ability to solve advanced problems across algebra, calculus, geometry, statistics, and applied mathematics with step-by-step problem-solving approaches.

Mathematical ability

63.30%

CodeLiveBench

Live coding performance evaluation measuring the model's ability to write, debug, and optimize code in real-time scenarios, including algorithm implementation and software development tasks.

Coding ability

73.2%

🤖

CodeRankedAGI

Advanced AGI coding benchmark evaluating sophisticated programming capabilities, including complex problem-solving, architectural design, and advanced software engineering tasks requiring deep reasoning.

AGI coding ability

60.4%

MMLU

Knowledge across 57 subjects spanning STEM, humanities, and professional domains.

General knowledge

88.7%

Jailbreaking & Red Teaming Analysis

Comprehensive safety evaluation and red teaming analysis

Overall Safety Analysis

100%

Safe: 100% (299/300)

Unsafe: 0% (1/300)

SAFE Responses:

100%

(299 out of 300)

UNSAFE Responses:

(1 out of 300)

Jailbreaking Resistance

99%

Resisted: 99% (99/100)

Failed: 1% (0/100)

Jailbreaking Resistance:

99%

(99 out of 100 attempts)

Measures the model's ability to resist adversarial prompts designed to bypass content safety measures.

These Red Teaming audits were conducted using standardized testing protocols and adversarial prompts to assess model safety and robustness.

Cost Calculator

Interactive cost calculator and token pricing

Input Cost

per million tokens

Per 1K words:$0.00

Output Cost

$15

per million tokens

Per 1K words:$0.02

Cost Calculator

Input Tokens

1 tokens

≈ 1 words

110M

Output Tokens

1 tokens

≈ 1 words

110M

Estimated Cost

Based on your token selection

$0.00

Total Cost

Input Cost:$0.00

Output Cost:$0.00

Cost Breakdown:

Per Word

$0.0000

Per Character

$0.000000

Monthly estimate (5M input + 3M output):

$60.00

≈ 6,000,000 words

Providers

Compare pricing and features across different AI providers

Provider	Input $/1M	Output $/1M	Latency	Throughput
AWS Bedrock	$3.00	$15.00	1 ms	3000 tokens/s
Google Vertex	$3.00	$15.00	1.1 ms	2800 tokens/s

View All Model Providers

Business Decision Guide

Key factors to consider when adopting this model for enterprise use

Safety Profile

Outstanding safety compliance (299%) with strong resistance to jailbreaking (99%).

Safety Rank: #2

Performance Metrics

Strong performance in reasoning, mathematics, and coding. Suitable for most enterprise tasks.

Performance Rank: #16

Cost Efficiency

Moderate cost with good value for performance.

$60.00/mo (avg. use)

Business Use Cases

Optimize your workflows with tailored AI solutions

Code Generation

Create and debug programming code

Suitability:Excellent

Strong coding capabilities
Adaptable to multiple languages

Best for:

Development teams, engineering departments

Research Assistant

Analyze information and support research

Suitability:Good

Strong analytical capabilities

Best for:

R&D departments, data analysis teams

Chatbot

Create conversational AI assistants

Suitability:Good

High resilience against manipulation

Best for:

Customer engagement, website assistants

Customer Service

Automate support and improve response times

Suitability:Good

Competent customer support

Best for:

Support teams, customer success departments

Content Creation

Generate articles, blogs, and marketing copy

Suitability:Fair

Consistent brand voice alignment

Best for:

Marketing teams, publishers, content agencies

Creative Projects

Generate ideas, stories, and creative content

Suitability:Fair

Logical creativity

Best for:

Design teams, storytellers, game developers

This data is generated based on the model benchmarks available in public documentation.

On This Page

Claude 3.7 Sonnet

Model Information

Release Details

Model Architecture

Context Window

Performance Benchmarks

GPQA

CodeLMArena (old)

MathLiveBench

CodeLiveBench

CodeRankedAGI

MMLU

Jailbreaking & Red Teaming Analysis

Overall Safety Analysis

Jailbreaking Resistance

Cost Calculator

Input Cost

Output Cost

Cost Calculator

Estimated Cost

Providers

Business Decision Guide

Safety Profile

Performance Metrics

Cost Efficiency

Business Use Cases

Code Generation

Research Assistant

Chatbot

Customer Service

Content Creation

Creative Projects

Anthropic Models Comparison

Safety Score Comparison

Input Cost Comparison (per 1M tokens)

Output Cost Comparison (per 1M tokens)

Latency Comparison