AI model leaderboard · LinkWord

Compare chat, image, and video models by composite score and category-specific metrics. Rankings are editorially maintained for reference.

Sort by

Rank	Model	Provider	ScoreEditorial composite score; higher ranks higher	Context (K)Context window size (thousand tokens)	Input $Input token price (USD per 1M tokens)	Output $Output token price (USD per 1M tokens)	MMLUMassive Multitask Language Understanding accuracy (%)	HumanEvalCode generation benchmark pass rate (%)	EloArena Elo from human preference battles; higher is stronger
1	G GPT-5.5	OpenAI	98.0	1050	5	30	91.50	91.40	1420
2	C Claude Fable 5 Anthropic's most capable model, positioned above Opus tier. Public Mythos-class model with 1M context, scoring >10% higher than Claude Opus 4.8 on key benchmarks. Adaptive thinking only.	Anthropic	97.0	1000	10	50	—	—	—
3	G Gemini 3.5 Pro	Google DeepMind	96.0	1000	1.50	9	91	89.50	1400
4	G GPT-5.6 Sol OpenAI 新旗舰模型，三款中的最强版本。在 HealthBench Professional 得分 60.5%，网络安全和生物风险评估均为 High 级别。	OpenAI	96.0	128	5	30	—	—	—
5	C Claude Opus 4.8 Anthropic 最新旗舰推理模型	Anthropic	95.0	1000	5	25	0	—	0
6	G GPT-5.6 Terra GPT-5.6 系列性价比版本，能力和旗舰接近，价格减半。	OpenAI	93.5	128	2.50	15	—	—	—
7	C Claude Opus 4.7	Anthropic	93.0	1000	5	25	91	92.50	1400
8	C Claude Sonnet 5	Anthropic	92.0	1000	2	10	—	—	1312
9	G GLM-5.2 开源推理模型新标杆，MIT 许可，AA Intelligence Index 51 分位居开源模型榜首。MoE 架构 753B 总参数/40B 活跃参数，1M 上下文。GDPval-AA v2 得分 1524，与 GPT-5.5（1514）持平。科学推理能力突出：GPQA Diamond 89%，HLE 40%。	Z.ai (Zhipu AI)	91.0	1000	1.40	4.40	—	—	—
10	S Sakana Fugu Ultra Fugu Ultra is Sakana AI's flagship multi-agent orchestration model. Rather than a single monolithic model, it dynamically orchestrates a pool of expert models to tackle complex multi-step tasks. Benchmarks competitive with Fable 5 and Mythos Preview. 1M context window, text+image input.	Sakana AI	91.0	1000	5	30	—	—	—
11	G GPT-5.6 Luna GPT-5.6 系列最快速、最低价版本。	OpenAI	91.0	128	1	6	—	—	—
12	C Claude Opus 4.6 Anthropic 1M 上下文推理模型，标准版 $5/$25，快速版 $30/$150	Anthropic	90.0	1000	5	25	0	—	0
13	G GPT-5.4 Pro OpenAI最强推理模型，1M+上下文，已解决前沿数学难题（Ramsey超图、Erdős问题）	OpenAI	90.0	1050	30	180	—	—	—
14	G Gemini 3.5 Flash Google lightweight flagship model with built-in Computer Use, function calling, Search/Maps Grounding — ideal for agent scenarios	Google DeepMind	90.0	1049	1.50	9	92.30	86.80	1370
15	G GPT-5.5 Instant	OpenAI	88.0	922	0.75	3	89.50	88.20	1350
16	V VibeThinker-3B 3B dense reasoning model. AIME26: 94.3, based on Qwen2.5. Uses Spectrum-to-Signal post-training. No tool calling support, focused on math and code reasoning.	Weibo AI	88.0	32	—	—	—	—	—
17	D DeepSeek V4 Pro 深度推理模型，MIT 开源许可，1M 上下文窗口，MoE 架构 1.6T 总参数/49B 活跃参数。AA Intelligence Index 44 分，仅次于 GLM-5.2 的开源模型第二名。缓存命中价格极低（$0.004/M tokens）。	DeepSeek	87.0	1000	0.43	0.87	—	—	—
18	K Kimi K2.7 Code 1T MoE 编程专用模型，256K 上下文，Modified MIT 开源，推理 token 消耗降低 30%	Moonshot AI	85.0	256	0.74	3.50	—	—	—
19	Q Qwen3.7 Max	Alibaba (Qwen)	85.0	1000	1.25	3.75	87	87	1300
20	G Gemini 3.1 Pro	Google DeepMind	85.0	1049	2	12	87.50	85	1300
21	D DeepSeek V4 Flash 高性价比推理模型，MIT 开源许可，1M 上下文窗口，MoE 架构 284B 总参数/13B 活跃参数。AA Intelligence Index 40 分，输出价格仅 $0.28/M tokens，缓存命中 $0.003/M tokens，极致性价比。	DeepSeek	83.0	1000	0.14	0.28	—	—	—
22	G Grok 4.20 Multi-Agent xAI multi-agent reasoning model built on Grok 4.20, supports multi-agent collaborative orchestration with 2M context, ideal for complex task decomposition and parallel execution	xAI	82.0	2000	1.25	2.50	86	85	1275
23	C Cursor Composer 2.5	Cursor	82.0	256	0	0	85	86	1260
24	K Kimi K2.6	Moonshot AI	82.0	262	0.68	3.42	85.50	84.50	1280
25	G GPT-5.4	OpenAI	82.0	1050	2.50	15	88.20	87.50	1320
26	G Grok 4.20 xAI推理模型，2M上下文，最低幻觉率，支持Agent工具调用	xAI	81.0	2000	1.25	2.50	—	—	—
27	C Claude Sonnet 4.6	Anthropic	80.0	1000	3	15	86.50	88	1280
28	M MiniMax M3 First open-weights model combining frontier coding, 1M context, and native multimodality. MSA sparse attention architecture. SWE-Bench Pro 59.0%, TerminalBench 66.0%. Aggressively priced.	MiniMax	80.0	1000	0.30	1.20	—	—	—
29	W Windsurf SWE-1.6	Windsurf (Codeium)	80.0	200	0	0	0	0	0
30	G Grok 4.3	xAI	80.0	1000	1.25	2.50	86	85	1270
31	M MiMo-V2.5 Pro	Xiaomi	78.0	1000	0.44	0.88	85	84	1260
32	L LongCat-2.0 1.6 trillion parameter MoE model from Meituan (LongCat), ~48B activated per token, trained on domestic AI ASIC superpods, 1M context window, MIT license	Meituan	78.0	1000	0	0	—	—	—
33	Q Qwen3.6 Plus	Alibaba (Qwen)	76.0	1000	0.33	1.95	84	84	1250
34	G GPT-4o	OpenAI	75.0	128	2.50	10	88.70	90.20	1287
35	Q Qwen3.7 Plus 阿里通义千问3.7系列性价比模型，1M上下文，支持多模态Agent	Alibaba (Qwen)	75.0	1000	0.32	1.28	—	—	—
36	G GLM-5.1	智谱AI (Zhipu)	75.0	200	0.40	1.20	83	82	1240
37	C Cursor Composer 2	Cursor	72.0	256	0	0	82	82	1220
38	M MiMo-V2.5	Xiaomi	72.0	1049	0.15	0.29	—	—	—
39	M MiniMax-M2.7	MiniMax	72.0	205	0.28	1.20	82	81	1220
40	K Kimi K2.5	Moonshot AI	72.0	262	0.40	1.90	82	82	1220
41	G Gemini 3 Flash	Google DeepMind	70.0	1000	0.15	0.60	82	80.50	1220
42	G GLM-5	智谱AI (Zhipu)	70.0	200	0.30	0.90	81	79	1210
43	Q Qwen3.5 397B	Alibaba (Qwen)	68.0	262	0.45	1.35	80.50	80.50	1200
44	G GPT-5.4 Mini GPT-5.4高效变体，400K上下文，优化高吞吐场景	OpenAI	67.0	400	0.75	4.50	—	—	—
45	Q Qwen3 Coder 480B A35B Qwen's most powerful open-source coding model. 480B MoE with 35B active params, native 256K context (YaRN scalable to 1M). Strong SWE-Bench performance. Apache 2.0 licensed. Ships with Qwen Code CLI.	Alibaba (Qwen)	66.0	256	0.22	1.80	—	—	—
46	G Gemini 2.5 Pro	Google DeepMind	65.0	1000	0.35	1.40	80.50	78	1180
47	G Grok 3	xAI	65.0	1000	0.15	0.60	80	80	1180
48	H Hunyuan Hy3 Preview	Tencent Hunyuan	65.0	256	0.06	0.18	79	78	1180
49	D DeepSeek V3.2 DeepSeek V3 系列最新版，131K 上下文，高性价比	DeepSeek	63.0	131	0.23	0.34	0	—	0
50	N Nemotron 3 Ultra NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model with 550B total params (55B active, MoE). Built on hybrid Mamba-Transformer architecture, it achieves 95% Long Context Ruler @1M and 65-70% SWE-bench Verified. Fully open weights, data, and training recipes.	NVIDIA	62.0	1000	0.50	2.20	—	—	—
51	G GPT-Live-1 OpenAI real-time voice conversation model, supports interruption and continuation, simultaneous voice chat and reasoning	OpenAI	62.0	32	0	0	0	—	0
52	C Claude 4.5 Haiku	Anthropic	60.0	200	0.80	4	78	75	1150
53	G Gemini 3.1 Flash Lite Google 超低价 1M 上下文模型，适合大批量处理	Google DeepMind	60.0	1049	0.25	1.50	0	—	0
54	C Codestral 2508 Mistral 代码专用模型，256K上下文，低定价	Mistral AI	60.0	256	0.30	0.90	0	—	0
55	S Step 3.7 Flash StepFun's latest multimodal MoE model with 196B parameter language backbone and vision encoder for native image/video understanding	StepFun	60.0	256	0.20	1.15	78	75	—
56	G GPT-5.4 Nano GPT-5.4最轻量变体，400K上下文，极速低成本	OpenAI	58.0	400	0.20	1.25	—	—	—
57	L Laguna XS 2.1 Poolside Laguna XS 2.1 is a 33B-A3B MoE coding agent model optimized for local deployment. Builds on XS.2 with improved SWE-bench Multilingual (63.1%) and stronger terminal-style task performance. Supports 256K context, runs locally in vLLM/SGLang.	Poolside	58.0	256	0.06	0.12	—	—	—
58	N Nex AGI Nex-N2-Pro Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active / 397B total parameters. Built on Qwen3.5 architecture, supports text and image input, 262K context window. Extremely cost-effective at $0.25/M input tokens.	Nex AGI	58.0	262	0.25	1	—	—	—
59	C Cursor Composer 1.5	Cursor	58.0	200	0	0	76	74	1150
60	D DeepSeek R1	DeepSeek	55.0	128	0.55	2.19	78.50	78.50	1100
61	N Nova 2.0 Pro	Amazon	55.0	256	0.80	3.20	76	72	1120
62	N Nemotron 3 Super	NVIDIA	55.0	1000	0.14	0.42	76	74	1120
63	M Mistral Large 2512 Mistral Large 2025年12月更新版，262K上下文，定价大幅下降	Mistral AI	55.0	262	0.50	1.50	0	—	0
64	M Mistral Medium 3.5 Mistral 中端模型，262K上下文	Mistral AI	55.0	262	1.50	7.50	0	—	0
65	C Cohere North Mini Code Cohere North Mini Code is Cohere's first agentic coding model and the debut of its North family. A sparse MoE with 30B total / 3B active parameters, optimized for end-to-end coding tasks. 256K context window, text input only.	Cohere	55.0	256	0	0	—	—	—
66	S Step 3.5 Flash	StepFun (阶跃星辰)	55.0	256	0.03	0.09	75	72	1100
67	D Doubao Seed Code	字节跳动 (ByteDance)	55.0	256	0.10	0.30	76	74	1120
68	M Mistral Large 3	Mistral AI	50.0	256	0.30	0.90	75	70	1100
69	G Grok Build 0.1 xAI's coding-focused model trained for agentic software engineering workflows, supports text+image input	xAI	50.0	256	1	2	—	—	—
70	C Command A+	Cohere	48.0	128	0	0	0	—	0
71	L Llama 4 Maverick	Meta	45.0	1000	0.17	0.50	72	72	1080
72	E ERNIE 5.0 Thinking	百度 (Baidu)	45.0	128	0.25	0.75	70	68	1050
73	G Granite 4.1 8B IBM's open-source 8B dense enterprise model, matching 32B MoE performance, 131K context, designed for enterprise tasks	IBM	40.0	131	0.05	0.10	—	—	—
74	L Llama 4 Scout	Meta	35.0	10000	0.11	0.33	65	65	1000
75	M Mistral Small 4	Mistral AI	35.0	256	0.10	0.30	65	62	980
76	C Command A	Cohere	35.0	256	1.50	4.50	62	60	970
77	P Phi-4	Microsoft	30.0	16	0.08	0.24	60	65	950
78	J Jamba 1.7 Large	AI21 Labs	30.0	256	1.30	3.90	58	60	930