2026 LLM API Pricing Comparison: From ¥0.03 to ¥34/M Tokens, How to Choose

2026 LLM API Pricing Overview

The LLM API price war has been raging since 2024, and prices have dropped by two orders of magnitude. Here's a comprehensive comparison of current pricing from major providers to help developers make informed choices.

Prices are in ¥/million tokens (USD converted at $1=¥6.80).

Flagship Models

These are each provider's most powerful models, suited for high-quality reasoning and complex code generation.

Model	Provider	Input (¥/M)	Output (¥/M)	Context
GPT-5.5	OpenAI	¥34.00	¥204.00	1050K
Claude Opus 4.7	Anthropic	¥34.00	¥170.00	1000K
Gemini 3.5 Pro	Google	¥10.20	¥61.20	1000K
Qwen3.7 Max	Alibaba	¥17.00	¥51.00	1000K
Claude Sonnet 4.6	Anthropic	¥20.40	¥102.00	1000K
GPT-5.4	OpenAI	¥17.00	¥102.00	1050K

Price gaps at the flagship tier are significant. GPT-5.5 output costs ¥204/M versus Gemini 3.5 Pro at ¥61.20/M — a 3x difference. If your task doesn't require the absolute highest reasoning quality, Gemini 3.5 Pro offers the best value at this tier.

Lightweight/Fast Models

Ideal for conversation, simple Q&A, code completion, and other speed-sensitive, lower-complexity tasks.

Model	Provider	Input (¥/M)	Output (¥/M)	Context
GPT-5.5 Instant	OpenAI	¥5.10	¥20.40	922K
Kimi K2.6	Moonshot	¥4.96	¥23.73	262K
Gemini 3.5 Flash	Google	¥10.20	¥61.20	1049K
Qwen3.6 Plus	Alibaba	¥2.21	¥13.26	1000K
MiniMax-M2.7	MiniMax	¥1.90	¥8.16	205K

GPT-5.5 Instant is OpenAI's speed-oriented offering at 1/4 the price of GPT-5.5. Kimi K2.6 performs well for coding and agent tasks at a similar price point.

Ultra-Low-Cost Models

This is where the price war is fiercest. Perfect for batch processing, agent loops, data annotation, and other high-frequency scenarios.

Model	Provider	Input (¥/M)	Output (¥/M)	Context
MiMo-V2.5 Pro	Xiaomi	¥3.00	¥6.00	1000K
DeepSeek V4 Pro	DeepSeek	¥2.96	¥5.92	1049K
GLM-5.1	Zhipu	¥2.72	¥8.16	200K
MiMo-V2.5	Xiaomi	¥1.02	¥1.97	1000K
DeepSeek V4 Flash	DeepSeek	¥0.95	¥1.90	1000K
Gemini 3 Flash	Google	¥1.02	¥4.08	1000K
混元 Hy3 Preview	Tencent	¥0.41	¥1.22	256K
Step 3.5 Flash	StepFun	¥0.20	¥0.61	256K

DeepSeek V4 Pro and MiMo-V2.5 Pro are priced nearly identically at around ¥3/M input and ¥6/M output, both supporting 1M context. These are currently the cheapest million-context models available.

Open-Source/Free Models

Calling open-source models through platforms like OpenRouter or Together AI is typically much cheaper than closed-source alternatives.

Model	Params	Input (¥/M)	Output (¥/M)	Context
Llama 4 Scout	—	¥0.75	¥2.24	10000K
Llama 4 Maverick	—	¥1.16	¥3.40	1000K
Mistral Large 3	—	¥2.04	¥6.12	256K
Phi-4	14B	¥0.54	¥1.63	16K

Llama 4 Scout supports 10M context — the longest of any open-source model. Running locally is completely free, requiring only hardware costs.

Scenario-Based Recommendations

Daily conversation, simple Q&A → GPT-5.5 Instant (¥5.10/¥20.40) or DeepSeek V4 Flash (¥0.95/¥1.90). Great value, fast response.

Complex reasoning, academic analysis → Claude Opus 4.7 (¥34/¥170) or GPT-5.5 (¥34/¥204). Expensive but the best quality.

Coding, code generation → DeepSeek V4 Pro (¥2.96/¥5.92) or MiMo-V2.5 Pro (¥3/¥6). Near-flagship coding ability at 1/10 the price.

Agents, high-frequency calls → DeepSeek V4 Flash (¥0.95/¥1.90) or Step 3.5 Flash (¥0.20/¥0.61). Prices so low they're negligible.

Long document processing (100K+ tokens) → DeepSeek V4 Pro / MiMo-V2.5 Pro (both support 1M context at ¥3/¥6).

Local deployment, data privacy → Ollama + Llama 4 Scout or Qwen3.5 open-source. One-time hardware cost, then free forever.

Notable Trends

Chinese models have a significant price advantage. For comparable output quality, domestic providers typically charge 1/5 to 1/10 of OpenAI/Anthropic. This is driven by lower inference infrastructure costs and fiercer price competition.

Cache-hit pricing is worth paying attention to. MiMo-V2.5 Pro cache hits cost just ¥0.025/M, and DeepSeek offers similar cache discounts. If your application has many repeated queries, smart caching can reduce costs by another order of magnitude.

Million-token context has become standard. In 2024, only a few models supported 128K+. By 2026, most mainstream models support 1M or more. Context length is no longer a bottleneck in model selection.

2026 LLM API Pricing Overview

Prices are in ¥/million tokens (USD converted at $1=¥6.80).

Flagship Models

These are each provider's most powerful models, suited for high-quality reasoning and complex code generation.

Model	Provider	Input (¥/M)	Output (¥/M)	Context
GPT-5.5	OpenAI	¥34.00	¥204.00	1050K
Claude Opus 4.7	Anthropic	¥34.00	¥170.00	1000K
Gemini 3.5 Pro	Google	¥10.20	¥61.20	1000K
Qwen3.7 Max	Alibaba	¥17.00	¥51.00	1000K
Claude Sonnet 4.6	Anthropic	¥20.40	¥102.00	1000K
GPT-5.4	OpenAI	¥17.00	¥102.00	1050K

Lightweight/Fast Models

Ideal for conversation, simple Q&A, code completion, and other speed-sensitive, lower-complexity tasks.

Model	Provider	Input (¥/M)	Output (¥/M)	Context
GPT-5.5 Instant	OpenAI	¥5.10	¥20.40	922K
Kimi K2.6	Moonshot	¥4.96	¥23.73	262K
Gemini 3.5 Flash	Google	¥10.20	¥61.20	1049K
Qwen3.6 Plus	Alibaba	¥2.21	¥13.26	1000K
MiniMax-M2.7	MiniMax	¥1.90	¥8.16	205K

GPT-5.5 Instant is OpenAI's speed-oriented offering at 1/4 the price of GPT-5.5. Kimi K2.6 performs well for coding and agent tasks at a similar price point.

Ultra-Low-Cost Models

This is where the price war is fiercest. Perfect for batch processing, agent loops, data annotation, and other high-frequency scenarios.

Model	Provider	Input (¥/M)	Output (¥/M)	Context
MiMo-V2.5 Pro	Xiaomi	¥3.00	¥6.00	1000K
DeepSeek V4 Pro	DeepSeek	¥2.96	¥5.92	1049K
GLM-5.1	Zhipu	¥2.72	¥8.16	200K
MiMo-V2.5	Xiaomi	¥1.02	¥1.97	1000K
DeepSeek V4 Flash	DeepSeek	¥0.95	¥1.90	1000K
Gemini 3 Flash	Google	¥1.02	¥4.08	1000K
混元 Hy3 Preview	Tencent	¥0.41	¥1.22	256K
Step 3.5 Flash	StepFun	¥0.20	¥0.61	256K

DeepSeek V4 Pro and MiMo-V2.5 Pro are priced nearly identically at around ¥3/M input and ¥6/M output, both supporting 1M context. These are currently the cheapest million-context models available.

Open-Source/Free Models

Calling open-source models through platforms like OpenRouter or Together AI is typically much cheaper than closed-source alternatives.

Model	Params	Input (¥/M)	Output (¥/M)	Context
Llama 4 Scout	—	¥0.75	¥2.24	10000K
Llama 4 Maverick	—	¥1.16	¥3.40	1000K
Mistral Large 3	—	¥2.04	¥6.12	256K
Phi-4	14B	¥0.54	¥1.63	16K

Llama 4 Scout supports 10M context — the longest of any open-source model. Running locally is completely free, requiring only hardware costs.

Scenario-Based Recommendations

Daily conversation, simple Q&A → GPT-5.5 Instant (¥5.10/¥20.40) or DeepSeek V4 Flash (¥0.95/¥1.90). Great value, fast response.

Complex reasoning, academic analysis → Claude Opus 4.7 (¥34/¥170) or GPT-5.5 (¥34/¥204). Expensive but the best quality.

Coding, code generation → DeepSeek V4 Pro (¥2.96/¥5.92) or MiMo-V2.5 Pro (¥3/¥6). Near-flagship coding ability at 1/10 the price.

Agents, high-frequency calls → DeepSeek V4 Flash (¥0.95/¥1.90) or Step 3.5 Flash (¥0.20/¥0.61). Prices so low they're negligible.

Long document processing (100K+ tokens) → DeepSeek V4 Pro / MiMo-V2.5 Pro (both support 1M context at ¥3/¥6).

Local deployment, data privacy → Ollama + Llama 4 Scout or Qwen3.5 open-source. One-time hardware cost, then free forever.

Notable Trends

Million-token context has become standard. In 2024, only a few models supported 128K+. By 2026, most mainstream models support 1M or more. Context length is no longer a bottleneck in model selection.

2026 LLM API Pricing Comparison: From ¥0.03 to ¥34/M Tokens, How to Choose | 2026-05-27

More articles

GPT-5.6 Series, Grok 4.5, Seedream 5.0 Pro: Three Major AI Model Launches | 2026-07-11

Daily Picks: WPS Comate, ModelScope, Volcengine | 2026-07-10

2026-07-09 Picks: Alibaba Bailian, Chanmama, Baidu AgentBuilder

Kimi K2.7 Code Released: Agent Workflow Rivals Opus 4.8 | 2026-07-09

2026 LLM API Pricing Comparison: From ¥0.03 to ¥34/M Tokens, How to Choose | 2026-05-27

2026 LLM API Pricing Overview

Flagship Models

Lightweight/Fast Models

Ultra-Low-Cost Models

Open-Source/Free Models

Scenario-Based Recommendations

Notable Trends

More articles

GPT-5.6 Series, Grok 4.5, Seedream 5.0 Pro: Three Major AI Model Launches | 2026-07-11

Daily Picks: WPS Comate, ModelScope, Volcengine | 2026-07-10

2026-07-09 Picks: Alibaba Bailian, Chanmama, Baidu AgentBuilder

Kimi K2.7 Code Released: Agent Workflow Rivals Opus 4.8 | 2026-07-09

2026 LLM API Pricing Overview

Flagship Models

Lightweight/Fast Models

Ultra-Low-Cost Models

Open-Source/Free Models

Scenario-Based Recommendations

Notable Trends