Anthropic logo

Anthropic

3 models available

Performance Benchmarks

Quantitative capabilities across reasoning, mathematics, and coding for Anthropic models

CodeLMArena

CodeLMArena
Competitive coding benchmark evaluating models on complex programming problems, debugging, and logical reasoning across multiple programming languages.
Claude 3.7 Sonnet1326
Claude 4 Sonnet1410

MathLiveBench

MathLiveBench
Real-time mathematical reasoning benchmark testing advanced problem-solving across algebra, calculus, geometry, statistics, and applied mathematics.
Claude Opus 4.190.0%
Claude 3.7 Sonnet63.30%
Claude 4 Sonnet70.5%

CodeLiveBench

CodeLiveBench
Live coding performance evaluation measuring the ability to write, debug, and optimize code in real-time scenarios including algorithm implementation and software development.
Claude Opus 4.174.5%
Claude 3.7 Sonnet32.40%
Claude 4 Sonnet72.7%