openai/gpt-oss-20b

gpt-oss · 20B · MXFP4

THINKING MODEL

ASUS (AMD Ryzen 7 7800X3D 8-Core Processor)

31 GB · cachyos rolling

Tested on July 5, 2026

Top 18% Compare

Global Score

84 /100

Not Rec.

Hardware Fit

65/100

Quality

92/100

Get this model

Get it in LM Studio

Search and download models directly from the app

🤗

Find on HuggingFace

MLX versions & conversions

Hardware

Machine: ASUS
CPU: AMD Ryzen 7 7800X3D 8-Core Processor
Cores: 16 threads (8 cores)
Frequency: 3.65 GHz
RAM: 31 GB
GPU: NVIDIA GeForce RTX 4090, Raphael
OS: cachyos rolling
Arch: x64
Power Mode: performance

Performance

Tokens/sec Tokens generated per second — higher is better: 221.5
Standard deviation Variation of speed across benchmark runs — lower means more consistent: ±1.0
First chunk latency Delay before the first streamed chunk arrives from runtime: 5 ms
Time to first token How fast the model starts responding — lower is better: 30.0 s
Load time Time to load the model into memory before inference: N/A
Memory usage RAM consumed during inference — percentage of total system memory: 21.1 GB (69%)
Total tokens Total number of tokens generated across all benchmark prompts: 1713
Thinking tokens (est.) Estimated internal reasoning tokens — models with "thinking" generate more tokens: ~938

Score breakdown

Speed

50/50

Time to first token

0/20

Memory

15/30

Quality

Reasoning

19/20

Coding

15/20

Instruction following

18/20

Structured output

15/15

Math

15/15

Multilingual

10/10

Category levels

Reasoning: Strong Coding: Strong Instruction Following: Strong Structured Output: Strong Math: Strong Multilingual: Strong

Metadata

Spec version: 0.2.1
Runtime: LM Studio
Model format: GGUF
Hardware profile: BALANCED
Result hash: 9035e1393e2c2637d543203838a795308ea9827e74edf2c1784f621ed7d0de49

Interpretation

Hardware fit: 65/100. Overall suitability: NOT RECOMMENDED (Global 84/100). Category profile: Reasoning: Strong, Coding: Strong, Instruction Following: Strong, Structured Output: Strong, Math: Strong, Multilingual: Strong.

Warnings

Model memory footprint is estimated via LM Studio CLI rather than measured from a fresh load.

Disqualifiers

Time to first token too high: 30000ms (maximum: 18582ms for BALANCED profile)

Bench Environment

Thermal: nominal CPU load: avg 7% (peak 8%)

Run yours now

$ npm install -g metrillm@latest

$ metrillm

Requires Node 20+ and Ollama or LM Studio running

Or run without installing: npx metrillm@latest