Compare chat, image, and video models by composite score and category-specific metrics. Rankings are editorially maintained for reference.
| Rank | Model | Provider | ScoreEditorial composite score; higher ranks higher | Context (K)Context window size (thousand tokens) | Input $Input token price (USD per 1M tokens) | Output $Output token price (USD per 1M tokens) | MMLUMassive Multitask Language Understanding accuracy (%) | HumanEvalCode generation benchmark pass rate (%) | EloArena Elo from human preference battles; higher is stronger |
|---|---|---|---|---|---|---|---|---|---|
| 1 | G GPT-5.5 | — | 98.0 | 1050 | 5 | 30 | 91.50 | 91.40 | 1420 |
| 2 | G Gemini 3.5 Pro | Google DeepMind | 96.0 | 1000 | 1.50 | 9 | 91 | 89.50 | 1400 |
| 3 | C Claude Opus 4.7 | — | 93.0 | 1000 | 5 | 25 | 91 | 92.50 | 1400 |
| 4 | G Gemini 3.5 Flash | — | 90.0 | 1049 | 1.50 | 9 | 92.30 | 86.80 | 1370 |
| 5 | G GPT-5.5 Instant | OpenAI | 88.0 | 922 | 0.75 | 3 | 89.50 | 88.20 | 1350 |
| 6 | Q Qwen3.7 Max | — | 85.0 | 1000 | 2.50 | 7.50 | 87 | 87 | 1300 |
| 7 | G Gemini 3.1 Pro | — | 85.0 | 1049 | 2 | 12 | 87.50 | 85 | 1300 |
| 8 | C Cursor Composer 2.5 | Cursor | 82.0 | 256 | 0 | 0 | 85 | 86 | 1260 |
| 9 | K Kimi K2.6 | — | 82.0 | 262 | 0.73 | 3.49 | 85.50 | 84.50 | 1280 |
| 10 | G GPT-5.4 | — | 82.0 | 1050 | 2.50 | 15 | 88.20 | 87.50 | 1320 |
| 11 | C Claude Sonnet 4.6 | Anthropic | 80.0 | 1000 | 3 | 15 | 86.50 | 88 | 1280 |
| 12 | W Windsurf SWE-1.6 | Windsurf (Codeium) | 80.0 | 200 | 0 | 0 | 0 | 0 | 0 |
| 13 | G Grok 4.3 | — | 80.0 | 1000 | 1.25 | 2.50 | 86 | 85 | 1270 |
| 14 | M MiMo-V2.5 Pro | Xiaomi | 78.0 | 1000 | 0.35 | 1 | 85 | 84 | 1260 |
| 15 | D DeepSeek V4 Pro | — | 78.0 | 1049 | 0.43 | 0.87 | 85 | 86.50 | 1260 |
| 16 | Q Qwen3.6 Plus | — | 76.0 | 1000 | 0.33 | 1.95 | 84 | 84 | 1250 |
| 17 | G GPT-4o | OpenAI | 75.0 | 128 | 2.50 | 10 | 88.70 | 90.20 | 1287 |
| 18 | G GLM-5.1 | 智谱AI (Zhipu) | 75.0 | 200 | 0.40 | 1.20 | 83 | 82 | 1240 |
| 19 | C Cursor Composer 2 | Cursor | 72.0 | 256 | 0 | 0 | 82 | 82 | 1220 |
| 20 | M MiniMax-M2.7 | — | 72.0 | 205 | 0.28 | 1.20 | 82 | 81 | 1220 |
| 21 | K Kimi K2.5 | — | 72.0 | 262 | 0.40 | 1.90 | 82 | 82 | 1220 |
| 22 | G Gemini 3 Flash | Google DeepMind | 70.0 | 1000 | 0.15 | 0.60 | 82 | 80.50 | 1220 |
| 23 | G GLM-5 | 智谱AI (Zhipu) | 70.0 | 200 | 0.30 | 0.90 | 81 | 79 | 1210 |
| 24 | D DeepSeek V4 Flash | DeepSeek | 68.0 | 1000 | 0.14 | 0.28 | 80 | 83 | 1200 |
| 25 | Q Qwen3.5 397B | Alibaba (Qwen) | 68.0 | 262 | 0.45 | 1.35 | 80.50 | 80.50 | 1200 |
| 26 | G Gemini 2.5 Pro | Google DeepMind | 65.0 | 1000 | 0.35 | 1.40 | 80.50 | 78 | 1180 |
| 27 | G Grok 3 | xAI | 65.0 | 1000 | 0.15 | 0.60 | 80 | 80 | 1180 |
| 28 | H Hunyuan Hy3 Preview | Tencent Hunyuan | 65.0 | 256 | 0.06 | 0.18 | 79 | 78 | 1180 |
| 29 | C Claude 4.5 Haiku | Anthropic | 60.0 | 200 | 0.80 | 4 | 78 | 75 | 1150 |
| 30 | C Cursor Composer 1.5 | Cursor | 58.0 | 200 | 0 | 0 | 76 | 74 | 1150 |
| 31 | D DeepSeek R1 | DeepSeek | 55.0 | 128 | 0.55 | 2.19 | 78.50 | 78.50 | 1100 |
| 32 | N Nova 2.0 Pro | Amazon | 55.0 | 256 | 0.80 | 3.20 | 76 | 72 | 1120 |
| 33 | N Nemotron 3 Super | NVIDIA | 55.0 | 1000 | 0.14 | 0.42 | 76 | 74 | 1120 |
| 34 | S Step 3.5 Flash | StepFun (阶跃星辰) | 55.0 | 256 | 0.03 | 0.09 | 75 | 72 | 1100 |
| 35 | D Doubao Seed Code | 字节跳动 (ByteDance) | 55.0 | 256 | 0.10 | 0.30 | 76 | 74 | 1120 |
| 36 | M Mistral Large 3 | Mistral AI | 50.0 | 256 | 0.30 | 0.90 | 75 | 70 | 1100 |
| 37 | C Command A+ | Cohere | 48.0 | 128 | 0 | 0 | 0 | — | 0 |
| 38 | L Llama 4 Maverick | Meta | 45.0 | 1000 | 0.17 | 0.50 | 72 | 72 | 1080 |
| 39 | E ERNIE 5.0 Thinking | 百度 (Baidu) | 45.0 | 128 | 0.25 | 0.75 | 70 | 68 | 1050 |
| 40 | L Llama 4 Scout | Meta | 35.0 | 10000 | 0.11 | 0.33 | 65 | 65 | 1000 |
| 41 | M Mistral Small 4 | Mistral AI | 35.0 | 256 | 0.10 | 0.30 | 65 | 62 | 980 |
| 42 | C Command A | Cohere | 35.0 | 256 | 1.50 | 4.50 | 62 | 60 | 970 |
| 43 | P Phi-4 | Microsoft | 30.0 | 16 | 0.08 | 0.24 | 60 | 65 | 950 |
| 44 | J Jamba 1.7 Large | AI21 Labs | 30.0 | 256 | 1.30 | 3.90 | 58 | 60 | 930 |