All Benchmarked Models

179 models tested across real hardware. Click any model to see detailed benchmark results.

qwen3.5:9b

qwen35 · 9.7B · Q4_K_M

10 benchmarks avg 76/100 35.6 tok/s

qwen3.5:4b

qwen35 · 4.7B · Q4_K_M

7 benchmarks avg 72/100 32.9 tok/s

qwen3.5:27b

qwen35 · 27.8B · Q4_K_M

6 benchmarks avg 52/100 6.2 tok/s

gpt-oss:20b

gptoss · 20.9B · MXFP4

5 benchmarks avg 89/100 71 tok/s

openai/gpt-oss-20b

gpt_oss · 20B · MXFP4

4 benchmarks avg 87/100 33.1 tok/s

qwen3:14b

qwen3 · 14.8B · Q4_K_M

4 benchmarks avg 78/100 17.5 tok/s

qwen3-vl-4b-instruct

qwen3vl · 4B · Q4_K_M

4 benchmarks avg 87/100 71.5 tok/s

qwen3:8b

qwen3 · 8.2B · Q4_K_M

4 benchmarks avg 73/100 111 tok/s

qwen3.5-9b

qwen35 · 9B · Q4_K_S

4 benchmarks avg 77/100 23.5 tok/s

gemma4:26b

gemma4 · 25.8B · Q4_K_M

3 benchmarks avg 69/100 30.9 tok/s

glm-4.7-flash:latest

glm4moelite · 29.9B · Q4_K_M

3 benchmarks avg 84/100 51.1 tok/s

qwen3.5-9b-mlx

qwen3_5 · 9B · 4bit

3 benchmarks avg 65/100 17.1 tok/s

qwen3.5:0.8b

qwen35 · 873.44M · Q8_0

3 benchmarks avg 63/100 60.5 tok/s

qwen3.5-0.8b-mlx

qwen3_5 · 0.8B · 8bit

3 benchmarks avg 56/100 76.4 tok/s

qwen3-coder:30b

qwen3moe · 30.5B · Q4_K_M

2 benchmarks avg 85/100 111.3 tok/s

unsloth/gemma-4-26b-a4b-it

gemma4 · 26B · Q3_K_M

2 benchmarks avg 87/100 20.7 tok/s

nemotron-3-nano:latest

nemotron_h_moe · 31.6B · Q4_K_M

2 benchmarks avg 88/100 62.5 tok/s

lfm2:24b

lfm2moe · 23.8B · Q4_K_M

2 benchmarks avg 88/100 90.4 tok/s

qwen3.5:35b-a3b

qwen35moe · 36.0B · Q4_K_M

2 benchmarks avg 74/100 36.2 tok/s

qwen2.5-7b-instruct

qwen2 · 7B · 4bit

2 benchmarks avg 83/100 23.2 tok/s

gemma4:e2b

gemma4 · 5.1B · Q4_K_M

2 benchmarks avg 83/100 88 tok/s

qwen3-vl:4b-instruct

qwen3vl · 4.4B · Q4_K_M

2 benchmarks avg 83/100 90.9 tok/s

qwen3.6-35b-a3b

qwen35moe · 35B · Q4_K_S

2 benchmarks avg 82/100 55 tok/s

qwen2.5-coder:7b

qwen2 · 7.6B · Q4_K_M

2 benchmarks avg 74/100 28.3 tok/s

lfm2-24b-a2b

2 benchmarks avg 48/100 56.2 tok/s

gemma4:31b

gemma4 · 31.3B · Q4_K_M

2 benchmarks avg 77/100 16.1 tok/s

deepseek-r1:7b

qwen2 · 7.6B · Q4_K_M

2 benchmarks avg 72/100 28.9 tok/s

qwen3.5-27b

qwen3_5 · 27B · 4bit

2 benchmarks avg 51/100 8 tok/s

liquid/lfm2-24b-a2b

lfm2_moe · 24B · 4bit

2 benchmarks avg 74/100 55.3 tok/s

qwen3:0.6b

qwen3 · 751.63M · Q4_K_M

2 benchmarks avg 67/100 144.8 tok/s

mistral:latest

llama · 7.2B · Q4_K_M

2 benchmarks avg 73/100 38.4 tok/s

mistral:7b

llama · 7.2B · Q4_K_M

2 benchmarks avg 71/100 22.2 tok/s

qwen3.5-35b-a3b-uncensored-hauhaucs-aggressive

qwen35moe · 35B · Q4_K_M

2 benchmarks avg 71/100 59.8 tok/s

qwen3.5-4b-mlx

qwen3_5 · 4B · 4bit

2 benchmarks avg 67/100 11.6 tok/s

qwen2.5:1.5b

qwen2 · 1.5B · Q4_K_M

2 benchmarks avg 68/100 51.9 tok/s

qwen3.5-2b-mlx

qwen3_5 · 2B · 8bit

2 benchmarks avg 68/100 40.9 tok/s

qwen2.5-1.5b-instruct

qwen2 · 1.5B · 8bit

2 benchmarks avg 65/100 32.4 tok/s

gemma3:1b

gemma3 · 999.89M · Q4_K_M

2 benchmarks avg 68/100 39.4 tok/s

smollm2-1.7b-instruct

llama · 1.7B · bf16

2 benchmarks avg 57/100 17.5 tok/s

starling-lm:7b

llama · 7B · Q4_0

2 benchmarks avg 64/100 23.7 tok/s

microsoft/phi-4-mini-reasoning

phi3 · 3.8B · 4bit

2 benchmarks avg 63/100 65.3 tok/s

smollm2-360m-instruct

llama · 360M · bf16

2 benchmarks avg 50/100 70.5 tok/s

qwen3-30b-a3b-thinking-2507-claude-4.5-sonnet-high-reasoning-distill-mlx

qwen3_moe · 30B · MXFP4

1 benchmarks avg 97/100 79.6 tok/s

gpt-oss-20b

1 benchmarks avg 95/100 118.2 tok/s

qwen/qwen3-30b-a3b-2507

qwen3_moe · 30B · 4bit

1 benchmarks avg 94/100 44.8 tok/s

nemotron-3-nano

1 benchmarks avg 93/100 93.3 tok/s

qwen/qwen3-8b

qwen3 · 8B · 4bit

1 benchmarks avg 89/100 33.4 tok/s

qwen3.6:35b-a3b

qwen35moe · 36.0B · Q4_K_M

1 benchmarks avg 88/100 43.3 tok/s

gpt-oss-orchestrator:latest

gptoss · 20.9B · MXFP4

1 benchmarks avg 88/100 24.6 tok/s

gpt-oss-safeguard-20b-mlx

gpt_oss · 20B · MXFP4

1 benchmarks avg 87/100 41.4 tok/s

qwen3:30b

qwen3moe · 30.5B · Q4_K_M

1 benchmarks avg 86/100 204.8 tok/s

qwen2.5:7b

qwen2 · 7.6B · Q4_K_M

1 benchmarks avg 84/100 22.2 tok/s

gemma4:e4b

gemma4 · 8.0B · Q4_K_M

1 benchmarks avg 84/100 53 tok/s

nvidia/nemotron-3-nano

nemotron_h · 30B · 4bit

1 benchmarks avg 84/100 47.9 tok/s

qwen2.5:14b

qwen2 · 14.8B · Q4_K_M

1 benchmarks avg 83/100 11.2 tok/s

qwen3-coder-next:latest

qwen3next · 79.7B · Q4_K_M

1 benchmarks avg 82/100 34.5 tok/s

unsloth-phi-4

llama · 4bit

1 benchmarks avg 82/100 27.4 tok/s

gemma3n:latest

gemma3n · 6.9B · Q4_K_M

1 benchmarks avg 81/100 108.7 tok/s

glm-4.7-flash:q4_K_M

glm4moelite · 29.9B · Q4_K_M

1 benchmarks avg 81/100 35.1 tok/s

qwen3-vl:30b

qwen3vlmoe · 31.1B · Q4_K_M

1 benchmarks avg 80/100 208.9 tok/s

gemma3:12b

gemma3 · 12.2B · Q4_K_M

1 benchmarks avg 80/100 12.7 tok/s

qwen3.5:122b-a10b

qwen35moe · 125.1B · Q4_K_M

1 benchmarks avg 80/100 19.1 tok/s

mlx-community/gemma-3-4b-it-qat-4bit

1 benchmarks avg 80/100 43.2 tok/s

gemma3:4b

gemma3 · 4.3B · Q4_K_M

1 benchmarks avg 80/100 34.6 tok/s

qwen2.5-coder-7b-instruct-mlx

qwen2 · 7B · 4bit

1 benchmarks avg 79/100 22.5 tok/s

lmstudio-community/meta-llama-3.1-8b-instruct

llama · 8B · Q4_K_M

1 benchmarks avg 78/100 46 tok/s

yi:6b

llama · 6B · Q4_0

1 benchmarks avg 77/100 27.6 tok/s

qwen2.5:3b

qwen2 · 3.1B · Q4_K_M

1 benchmarks avg 77/100 47.2 tok/s

gemma2:2b

gemma2 · 2.6B · Q4_0

1 benchmarks avg 77/100 55.1 tok/s

internlm2:7b

internlm2 · 7.7B · Q4_0

1 benchmarks avg 77/100 22.7 tok/s

mlx-community/meta-llama-3.1-8b-instruct

llama · 8B · 4bit

1 benchmarks avg 77/100 54.3 tok/s

qwen3.5:35b

qwen35moe · 36.0B · Q4_K_M

1 benchmarks avg 77/100 16 tok/s

cogito:8b

llama · 8.0B · Q4_K_M

1 benchmarks avg 77/100 21 tok/s

gemma2:9b

gemma2 · 9.2B · Q4_0

1 benchmarks avg 76/100 17.5 tok/s

ministral-3:8b

mistral3 · 8.9B · Q4_K_M

1 benchmarks avg 76/100 19.5 tok/s

glm4:9b

chatglm · 9.4B · Q4_0

1 benchmarks avg 76/100 19 tok/s

google/gemma-3-4b

gemma3 · 4B · 4bit

1 benchmarks avg 76/100 39.4 tok/s

gemma-3-27b-it-qat

gemma3 · 27B · 4bit

1 benchmarks avg 76/100 8.9 tok/s

exaone-3.5-2.4b-instruct-mlx

exaone · 2.4B · 8bit

1 benchmarks avg 75/100 37 tok/s

phi4:14b

phi3 · 14.7B · Q4_K_M

1 benchmarks avg 75/100 11 tok/s

llama3.1:8b

llama · 8.0B · Q4_K_M

1 benchmarks avg 75/100 20.3 tok/s

granite3.1-dense:8b

granite · 8.2B · Q4_K_M

1 benchmarks avg 74/100 18.8 tok/s

minimax-m2.7:cloud

minimax

1 benchmarks avg 74/100 0 tok/s

mlx-community/Llama-3.2-3B-Instruct-4bit

1 benchmarks avg 74/100 50.6 tok/s

llama3.2:3b

llama · 3.2B · Q4_K_M

1 benchmarks avg 74/100 44.1 tok/s

hermes3:8b

llama · 8.0B · Q4_0

1 benchmarks avg 74/100 22.3 tok/s

dolphin3:8b

llama · 8.0B · Q4_K_M

1 benchmarks avg 74/100 21 tok/s

mistralai/magistral-small-2509

mistral3 · 24B · 4bit

1 benchmarks avg 74/100 7.9 tok/s

qwen3.5:2b

qwen35 · 2.3B · Q8_0

1 benchmarks avg 73/100 30.4 tok/s

qwen2.5-coder-3b-instruct-mlx

qwen2 · 3B · 4bit

1 benchmarks avg 73/100 52.3 tok/s

mlx-community/Yi-1.5-6B-Chat-4bit

1 benchmarks avg 73/100 29.1 tok/s

llama3.2:latest

llama · 3.2B · Q4_K_M

1 benchmarks avg 73/100 98.9 tok/s

qwen3:1.7b

qwen3 · 2.0B · Q4_K_M

1 benchmarks avg 72/100 120.7 tok/s

mlx-community/gemma-3-1b-it-8bit

1 benchmarks avg 72/100 86.3 tok/s

aya-expanse:8b

command-r · 8.0B · Q4_K_M

1 benchmarks avg 72/100 19.7 tok/s

codegemma:7b

gemma · 9B · Q4_0

1 benchmarks avg 71/100 18.7 tok/s

mistral-nemo:12b

llama · 12.2B · Q4_0

1 benchmarks avg 71/100 14.6 tok/s

ministral-3:3b

mistral3 · 3.8B · Q4_K_M

1 benchmarks avg 71/100 42.2 tok/s

llama3:latest

llama · 8.0B · Q4_0

1 benchmarks avg 71/100 22.3 tok/s

google/gemma-3-27b

gemma3 · 27B · 4bit

1 benchmarks avg 71/100 5.9 tok/s

phi4-mini:latest

phi3 · 3.8B · Q4_K_M

1 benchmarks avg 71/100 36.2 tok/s

qwen2.5-coder-1.5b-instruct-mlx

qwen2 · 1.5B · 8bit

1 benchmarks avg 70/100 58.4 tok/s

yi-coder:9b

llama · 8.8B · Q4_0

1 benchmarks avg 70/100 19.4 tok/s

deepseek-v2:16b

deepseek2 · 15.7B · Q4_0

1 benchmarks avg 69/100 56.6 tok/s

mistralai/devstral-small-2-2512

mistral3 · 24B · 4bit

1 benchmarks avg 69/100 5.7 tok/s

lfm2.5-1.2b-instruct-mlx

lfm2 · 1.2B · 8bit

1 benchmarks avg 68/100 77.3 tok/s

granite3.1-dense:2b

granite · 2.5B · Q4_K_M

1 benchmarks avg 68/100 54.6 tok/s

nous-hermes2:latest

llama · 11B · Q4_0

1 benchmarks avg 68/100 15.9 tok/s

cogito:3b

llama · 3.6B · Q4_K_M

1 benchmarks avg 67/100 45.6 tok/s

neural-chat:7b

llama · 7B · Q4_0

1 benchmarks avg 67/100 22.5 tok/s

aya:8b

command-r · 8.0B · F16

1 benchmarks avg 66/100 20 tok/s

granite-3.3-2b-instruct

granite · 2B · bf16

1 benchmarks avg 66/100 18.3 tok/s

deepseek-r1:1.5b

qwen2 · 1.8B · Q4_K_M

1 benchmarks avg 66/100 85.4 tok/s

smollm2:1.7b

llama · 1.7B · Q8_0

1 benchmarks avg 64/100 51.8 tok/s

phi3:3.8b

phi3 · 3.8B · Q4_0

1 benchmarks avg 64/100 42 tok/s

falcon-h1-1.5b-instruct

falcon-h1 · 1.5B · Q4_K_M

1 benchmarks avg 63/100 6.6 tok/s

solar:10.7b

llama · 11B · Q4_0

1 benchmarks avg 63/100 15.8 tok/s

phi3:14b

phi3 · 14.0B · Q4_0

1 benchmarks avg 63/100 12.7 tok/s

codellama:7b

llama · 7B · Q4_0

1 benchmarks avg 63/100 23.6 tok/s

llama3.2:1b

llama · 1.2B · Q8_0

1 benchmarks avg 62/100 170.2 tok/s

qwen3-vl:2b

qwen3vl · 2.1B · Q4_K_M

1 benchmarks avg 62/100 127.4 tok/s

dolphin-phi:2.7b

phi2 · 3B · Q4_0

1 benchmarks avg 61/100 56.1 tok/s

qwen2.5-coder-1.5b-instruct

qwen2 · 1.5B · Q4_K_M

1 benchmarks avg 61/100 7.3 tok/s

text-embedding-nomic-embed-text-v1.5

nomic-bert · Q4_K_M

1 benchmarks avg 59/100 45.3 tok/s

vicuna:7b

llama · 7B · Q4_0

1 benchmarks avg 59/100 23.4 tok/s

vicuna:13b

llama · 13B · Q4_0

1 benchmarks avg 59/100 13.4 tok/s

wizardlm2:7b

llama · 7B · Q4_0

1 benchmarks avg 59/100 23.2 tok/s

gemma-3-1b-it

gemma3 · 1B · Q4_K_M

1 benchmarks avg 59/100 8.7 tok/s

phi:2.7b

phi2 · 3B · Q4_0

1 benchmarks avg 59/100 56.6 tok/s

gemma-2-2b-it

gemma2 · 2B · Q4_K_M

1 benchmarks avg 58/100 3.6 tok/s

falcon-h1-0.5b-instruct

falcon-h1 · 0.5B · Q4_K_M

1 benchmarks avg 57/100 16.3 tok/s

qwen2.5:0.5b

qwen2 · 494.03M · Q4_K_M

1 benchmarks avg 57/100 165 tok/s

mlx-community/Llama-3.2-1B-Instruct-4bit

1 benchmarks avg 57/100 126.4 tok/s

deepseek-coder:6.7b

llama · 7B · Q4_0

1 benchmarks avg 57/100 24.1 tok/s

llama2:13b

llama · 13B · Q4_0

1 benchmarks avg 56/100 13.5 tok/s

orca-mini:7b

llama · 7B · Q4_0

1 benchmarks avg 55/100 24.4 tok/s

phi-3.5-mini-instruct

phi3 · 4bit

1 benchmarks avg 55/100 43.4 tok/s

qwen2.5-coder-0.5b-instruct

qwen2 · 0.5B · Q4_K_M

1 benchmarks avg 55/100 13.5 tok/s

llama2:7b

llama · 7B · Q4_0

1 benchmarks avg 55/100 25.1 tok/s

internlm2_5-1_8b-chat

internlm2 · Q4_K_M

1 benchmarks avg 55/100 6.6 tok/s

falcon3-1b-instruct

llama · 1B · 3bit

1 benchmarks avg 54/100 121.8 tok/s

mlx-community/Nanbeige4.1-3B-8bit

1 benchmarks avg 53/100 25.4 tok/s

qwen2.5-0.5b-instruct

qwen2 · 0.5B · Q4_K_M

1 benchmarks avg 53/100 13.2 tok/s

qwen2.5-0.5b-instruct-mlx

qwen2 · 0.5B · 4bit

1 benchmarks avg 52/100 257.5 tok/s

smollm2:360m

llama · 361.82M · F16

1 benchmarks avg 52/100 117.9 tok/s

gemma3:270m

gemma3 · 268.10M · Q8_0

1 benchmarks avg 50/100 239.2 tok/s

granite-3.1-1b-a400m-instruct

granitemoe · 1B · Q4_K_M

1 benchmarks avg 50/100 19.6 tok/s

llama-3.2-1b-instruct

llama · 1B · Q4_K_M

1 benchmarks avg 50/100 9.9 tok/s

mlx-community/stablelm-2-zephyr-1_6b-4bit

1 benchmarks avg 50/100 93.6 tok/s

qwen3.5-0.8b

qwen35 · 0.8B · Q4_K_M

1 benchmarks avg 48/100 6.2 tok/s

orca-mini:3b

llama · 3B · Q4_0

1 benchmarks avg 47/100 30.6 tok/s

qwen3:4b

qwen3 · 4.0B · Q4_K_M

1 benchmarks avg 47/100 11.3 tok/s

stablelm2:1.6b

stablelm · 2B · Q4_0

1 benchmarks avg 45/100 93.8 tok/s

qwen2.5-math-1.5b-instruct

qwen2 · 1.5B · 4bit

1 benchmarks avg 45/100 91.7 tok/s

mlx-community/quantized-gemma-2b-it

1 benchmarks avg 45/100 44.4 tok/s

yi-coder-1.5b-chat

llama · 1.5B · Q4_K_M

1 benchmarks avg 45/100 7.7 tok/s

opencoder-1.5b-instruct

llama · 1.5B · Q4_K_M

1 benchmarks avg 44/100 4.8 tok/s

amd-olmo-1b-sft

olmo · 1B · Q4_K_M

1 benchmarks avg 44/100 9.2 tok/s

tinyllama:1.1b

llama · 1B · Q4_0

1 benchmarks avg 43/100 121 tok/s

falcon-h1-tiny-90m-instruct

falcon-h1 · 90M · Q4_K_M

1 benchmarks avg 43/100 67.8 tok/s

tinyllama

1 benchmarks avg 43/100 133.4 tok/s

gemma-3-270m-it-qat-mlx

gemma3_text · 270M · 4bit

1 benchmarks avg 43/100 344.2 tok/s

smollm2:135m

llama · 134.52M · F16

1 benchmarks avg 41/100 241 tok/s

qwen3-0.6b

qwen3 · 0.6B · Q4_K_M

1 benchmarks avg 40/100 14.4 tok/s

starcoder2:3b

starcoder2 · 3B · Q4_0

1 benchmarks avg 39/100 49.4 tok/s

phi-3-mini-128k-instruct

phi3 · 4bit

1 benchmarks avg 38/100 44 tok/s

smollm2-135m-instruct

llama · 135M · Q4_K_M

1 benchmarks avg 38/100 39.3 tok/s

deepseek-r1-distill-qwen-14b-mlx

qwen2 · 14B · 5bit

1 benchmarks avg 36/100 10.3 tok/s

stablelm-2-zephyr-1.6b

stablelm · 1.6B · Q4_K_M

1 benchmarks avg 35/100 6.5 tok/s

starcoder2:7b

starcoder2 · 7B · Q4_0

1 benchmarks avg 35/100 23.8 tok/s

qwen3.5-27b-claude-4.6-opus-distilled-mlx

qwen3_5 · 27B · 4bit

1 benchmarks avg 34/100 6.1 tok/s

qwen3.5:latest

qwen35 · 9.7B · Q4_K_M

1 benchmarks avg 34/100 13.3 tok/s

tinyllama-1.1b-chat-v1.0

llama · 1.1B · Q3_K_M

1 benchmarks avg 33/100 9.9 tok/s

lmstudio-community/Phi-4-reasoning-plus-MLX-4bit

1 benchmarks avg 33/100 11.5 tok/s

bloomz-560m

bloom · 560M · Q4_K_M

1 benchmarks avg 30/100 12.2 tok/s

deepseek-coder-1.3b-instruct

llama · 1.3B · Q8_0

1 benchmarks avg 30/100 7.1 tok/s

phi-1_5

phi2 · Q8_0

1 benchmarks avg 28/100 6.9 tok/s

deepseek-r1:14b

qwen2 · 14.8B · Q4_K_M

1 benchmarks avg 22/100 11.4 tok/s

qwen3-1.7b

qwen3 · 1.7B · Q4_K_M

1 benchmarks avg 19/100 5.9 tok/s