GPT-OSS-20B

Safety #9

Safety Ranking

Ranked #9 out of all models based on safe response rate, jailbreaking resistance, and harmful content filtering effectiveness.

Compare

OpenAI

GPT-OSS-20B is OpenAI's open-weight language model with 21 billion parameters, designed for efficient deployment on consumer hardware with just 16GB memory. Released under Apache 2.0 license, it demonstrates strong performance across benchmarks including 85.3% MMLU and exceptional AIME scores.

gpt-oss-20bAvailableAPI Reference Playground Documentation

Max Input

128K

Tokens

Input Price

per 1M Tokens

Output Price

per 1M Tokens

Safety Score

96%

Safe Responses

Size

20B

Parameters

Model Information

Detailed specifications and technical details

Release Details

Release Date

05-Aug-25

Knowledge Cutoff

License

Apache 2.0

Model Architecture

Parameters

20B Parameters

Training Data

26.6T tokens

Context Window

Input Context Length

128K tokens

Max Output Tokens

Features & Capabilities

Core functionality and supported features

Features

Streaming

Supported

Function calling

Not supported

Structured outputs

Not supported

Fine-tuning

Supported

Distillation

Not supported

Predicted outputs

Not supported

Multimodal

Not supported

Reasoning

Not supported

Tools

Tools supported when using the Responses API

Web search

Not supported

File search

Not supported

Code interpreter

Not supported

Image generation

Not supported

Computer use

Not supported

MCP

Not supported

Modalities

Text

Input:Yes

Output:Yes

Image

Input:No

Output:No

Audio

Input:No

Output:No

Performance Benchmarks

Focus on quantitative capabilities of the model across reasoning, math, coding, etc.

GPQA

Graduate-level multiple-choice questions across science domains; Google-proof and extremely challenging.

Science knowledge

71.5%

➗

MathLiveBench

Real-time mathematical reasoning benchmark testing the model's ability to solve advanced problems across algebra, calculus, geometry, statistics, and applied mathematics with step-by-step problem-solving approaches.

Mathematical ability

69.54%

🤖

CodeRankedAGI

Advanced AGI coding benchmark evaluating sophisticated programming capabilities, including complex problem-solving, architectural design, and advanced software engineering tasks requiring deep reasoning.

AGI coding ability

24.8%

MMLU

Knowledge across 57 subjects spanning STEM, humanities, and professional domains.

General knowledge

85.3%

Jailbreaking & Red Teaming Analysis

Comprehensive safety evaluation and red teaming analysis

Overall Safety Analysis

96%

Safe: 96% (289/300)

Unsafe: 4% (11/300)

SAFE Responses:

96%

(289 out of 300)

UNSAFE Responses:

(11 out of 300)

Jailbreaking Resistance

89%

Resisted: 89% (89/100)

Failed: 11% (11/100)

Jailbreaking Resistance:

89%

(89 out of 100 attempts)

Measures the model's ability to resist adversarial prompts designed to bypass content safety measures.

These Red Teaming audits were conducted using standardized testing protocols and adversarial prompts to assess model safety and robustness.

Cost Calculator

Interactive cost calculator and token pricing

No Pricing Information Available

Pricing data is not available for this model.

Providers

Compare pricing and features across different AI providers

Provider	Input $/1M	Output $/1M	Latency	Throughput
AWS Bedrock	$0.07	$0.30	0.8 ms	5000 tokens/s
Azure	$0.07	$0.30	0.9 ms	4500 tokens/s
DeepInfra	$40.00	$160.00	0.24 ms	127.9 tokens/s
nCompass	$50.00	$150.00	1.23 ms	132.3 tokens/s
Together	$50.00	$200.00	0.23 ms	155.7 tokens/s
Fireworks	$50.00	$200.00	0.42 ms	290.6 tokens/s
NovitaAI	$50.00	$200.00	0.47 ms	172.5 tokens/s
Nebius AI Studio	$50.00	$200.00	0.5 ms	72.5 tokens/s
Groq	$100.00	$500.00	0.3 ms	4767 tokens/s

View All Model Providers

Business Decision Guide

Key factors to consider when adopting this model for enterprise use

Safety Profile

Strong safety measures with good compliance rates. Suitable for enterprise use.

Safety Rank: #9

Performance Metrics

Moderate performance. Suitable for basic tasks and cost-sensitive applications.

Cost Efficiency

Highly cost-effective with excellent context handling.

$0.00/mo (avg. use)

Business Use Cases

Optimize your workflows with tailored AI solutions

Chatbot

Create conversational AI assistants

Suitability:Excellent

High resilience against manipulation
Cost-effective for high volume

Best for:

Customer engagement, website assistants

Customer Service

Automate support and improve response times

Suitability:Excellent

Competent customer support
Scalable solution

Best for:

Support teams, customer success departments

Research Assistant

Analyze information and support research

Suitability:Good

Strong analytical capabilities

Best for:

R&D departments, data analysis teams

Content Creation

Generate articles, blogs, and marketing copy

Suitability:Fair

Consistent brand voice alignment

Best for:

Marketing teams, publishers, content agencies

Creative Projects

Generate ideas, stories, and creative content

Suitability:Fair

Logical creativity

Best for:

Design teams, storytellers, game developers

Code Generation

Create and debug programming code

Suitability:Fair

Standard capabilities for this use case

Best for:

Development teams, engineering departments

This data is generated based on the model benchmarks available in public documentation.

On This Page

GPT-OSS-20B

Model Information

Release Details

Model Architecture

Context Window

Features & Capabilities

Features

Tools

Modalities

Text

Image

Audio

Performance Benchmarks

GPQA

MathLiveBench

CodeRankedAGI

MMLU

Jailbreaking & Red Teaming Analysis

Overall Safety Analysis

Jailbreaking Resistance

Cost Calculator

No Pricing Information Available

Providers

Business Decision Guide

Safety Profile

Performance Metrics

Cost Efficiency

Business Use Cases

Chatbot

Customer Service

Research Assistant

Content Creation

Creative Projects

Code Generation

OpenAI Models Comparison

Safety Score Comparison

Input Cost Comparison (per 1M tokens)

Output Cost Comparison (per 1M tokens)