OpenAI logo

GPT-OSS-120B

Safety #5
Safety Ranking
Ranked #5 out of all models based on safe response rate, jailbreaking resistance, and harmful content filtering effectiveness.
Compare

OpenAI

GPT-OSS-120B is OpenAI's larger open-weight language model with 117 billion parameters (5.1B active per token), optimized for deployment on a single 80GB GPU. Released under Apache 2.0 license, it achieves near-parity with o4-mini on reasoning benchmarks with exceptional MMLU (90%) and AIME performance.

Max Input
128,000
Tokens
Input Price
$-
per 1M Tokens
Output Price
$-
per 1M Tokens
Safety Score
292%
Safe Responses
Size
120B
Parameters

Model Information

Detailed specifications and technical details

Release Details

Release Date
05-Aug-25
Knowledge Cutoff
-
License
Apache 2.0

Model Architecture

Parameters
120B Parameters
Training Data
159.6T tokens

Context Window

Input Context Length
128,000 tokens
Max Output Tokens
-

Features & Capabilities

Core functionality and supported features

Features

Streaming
Supported
Function calling
Supported
Structured outputs
Not supported
Fine-tuning
Supported
Distillation
Not supported
Predicted outputs
Not supported
Multimodal
Not supported
Reasoning
Supported

Tools

Tools supported when using the Responses API

Web search
Supported
File search
Not supported
Code interpreter
Supported
Image generation
Not supported
Computer use
Not supported
MCP
Not supported

Modalities

Text
Input:Yes
Output:Yes
Image
Input:No
Output:No
Audio
Input:No
Output:No

Performance Benchmarks

Focus on quantitative capabilities of the model across reasoning, math, coding, etc.

MathLiveBench

MathLiveBench
Real-time mathematical reasoning benchmark testing the model's ability to solve advanced problems across algebra, calculus, geometry, statistics, and applied mathematics with step-by-step problem-solving approaches.

Mathematical ability

69.54%

CodeLiveBench

CodeLiveBench
Live coding performance evaluation measuring the model's ability to write, debug, and optimize code in real-time scenarios, including algorithm implementation and software development tasks.

Coding ability

58.80%

Jailbreaking & Red Teaming Analysis

Comprehensive safety evaluation and red teaming analysis

Overall Safety Analysis

97%
Safe: 97% (292/300)
Unsafe: 3% (8/300)
SAFE Responses:

97%

(292 out of 300)

UNSAFE Responses:

3%

(8 out of 300)

Jailbreaking Resistance

92%
Resisted: 92% (92/100)
Failed: 8% (8/100)
Jailbreaking Resistance:

92%

(92 out of 100 attempts)

Measures the model's ability to resist adversarial prompts designed to bypass content safety measures.

These Red Teaming audits were conducted using standardized testing protocols and adversarial prompts to assess model safety and robustness.

Cost Calculator

Interactive cost calculator and token pricing

No Pricing Information Available

Pricing data is not available for this model.

Providers

Compare pricing and features across different AI providers

Provider
Input $/1M
Output $/1M
Latency
Throughput
AWS Bedrock logo
AWS Bedrock
$0.15$0.601.2 ms3000 tokens/s
C
Chutes
$73.00$290.002.28 ms237 tokens/s
DeepInfra logo
DeepInfra
$90.00$450.000.56 ms117.5 tokens/s
nCompass logo
nCompass
$100.00$450.001.59 ms39.3 tokens/s
Baseten logo
Baseten
$100.00$500.000.27 ms305.1 tokens/s
NovitaAI logo
NovitaAI
$100.00$500.000.45 ms121.4 tokens/s
AtlasCloud logo
AtlasCloud
$100.00$500.000.44 ms139.8 tokens/s
Crusoe logo
Crusoe
$150.00$500.000.56 ms151 tokens/s
Fireworks logo
Fireworks
$150.00$600.000.82 ms221 tokens/s
Together logo
Together
$150.00$600.000.37 ms81.95 tokens/s
Parasail logo
Parasail
$150.00$600.000.73 ms68.02 tokens/s
Nebius AI Studio logo
Nebius AI Studio
$150.00$600.000.72 ms55.65 tokens/s
Groq logo
Groq
$150.00$750.000.23 ms1000 tokens/s
Cerebras logo
Cerebras
$250.00$690.000.44 ms4041 tokens/s

Business Decision Guide

Key factors to consider when adopting this model for enterprise use

Safety Profile

Strong safety measures with good compliance rates. Suitable for enterprise use.

Safety Rank: #5

Performance Metrics

Solid performance across key metrics. Good for general business applications.

Cost Efficiency

Highly cost-effective with excellent context handling.

$0.00/mo (avg. use)

Business Use Cases

Optimize your workflows with tailored AI solutions

Chatbot

Create conversational AI assistants

Suitability:Excellent
  • High resilience against manipulation
  • Cost-effective for high volume

Best for:

Customer engagement, website assistants

Customer Service

Automate support and improve response times

Suitability:Excellent
  • Competent customer support
  • Scalable solution

Best for:

Support teams, customer success departments

Code Generation

Create and debug programming code

Suitability:Excellent
  • Strong coding capabilities

Best for:

Development teams, engineering departments

Research Assistant

Analyze information and support research

Suitability:Good
  • Strong analytical capabilities

Best for:

R&D departments, data analysis teams

Content Creation

Generate articles, blogs, and marketing copy

Suitability:Fair
  • Consistent brand voice alignment

Best for:

Marketing teams, publishers, content agencies

Creative Projects

Generate ideas, stories, and creative content

Suitability:Fair
  • Logical creativity

Best for:

Design teams, storytellers, game developers

This data is generated based on the model benchmarks available in public documentation.

OpenAI Models Comparison

Compare metrics across different OpenAI models

Safety Score Comparison

Input Cost Comparison (per 1M tokens)

Output Cost Comparison (per 1M tokens)