2026 LLM API Pricing Overview
The LLM API price war has been raging since 2024, and prices have dropped by two orders of magnitude. Here's a comprehensive comparison of current pricing from major providers to help developers make informed choices.
Prices are in ¥/million tokens (USD converted at $1=¥6.80).
Flagship Models
These are each provider's most powerful models, suited for high-quality reasoning and complex code generation.
| Model | Provider | Input (¥/M) | Output (¥/M) | Context |
|---|---|---|---|---|
| GPT-5.5 | OpenAI | ¥34.00 | ¥204.00 | 1050K |
| Claude Opus 4.7 | Anthropic | ¥34.00 | ¥170.00 | 1000K |
| Gemini 3.5 Pro | ¥10.20 | ¥61.20 | 1000K | |
| Qwen3.7 Max | Alibaba | ¥17.00 | ¥51.00 | 1000K |
| Claude Sonnet 4.6 | Anthropic | ¥20.40 | ¥102.00 | 1000K |
| GPT-5.4 | OpenAI | ¥17.00 | ¥102.00 | 1050K |
Price gaps at the flagship tier are significant. GPT-5.5 output costs ¥204/M versus Gemini 3.5 Pro at ¥61.20/M — a 3x difference. If your task doesn't require the absolute highest reasoning quality, Gemini 3.5 Pro offers the best value at this tier.
Lightweight/Fast Models
Ideal for conversation, simple Q&A, code completion, and other speed-sensitive, lower-complexity tasks.
| Model | Provider | Input (¥/M) | Output (¥/M) | Context |
|---|---|---|---|---|
| GPT-5.5 Instant | OpenAI | ¥5.10 | ¥20.40 | 922K |
| Kimi K2.6 | Moonshot | ¥4.96 | ¥23.73 | 262K |
| Gemini 3.5 Flash | ¥10.20 | ¥61.20 | 1049K | |
| Qwen3.6 Plus | Alibaba | ¥2.21 | ¥13.26 | 1000K |
| MiniMax-M2.7 | MiniMax | ¥1.90 | ¥8.16 | 205K |
GPT-5.5 Instant is OpenAI's speed-oriented offering at 1/4 the price of GPT-5.5. Kimi K2.6 performs well for coding and agent tasks at a similar price point.
Ultra-Low-Cost Models
This is where the price war is fiercest. Perfect for batch processing, agent loops, data annotation, and other high-frequency scenarios.
| Model | Provider | Input (¥/M) | Output (¥/M) | Context |
|---|---|---|---|---|
| MiMo-V2.5 Pro | Xiaomi | ¥3.00 | ¥6.00 | 1000K |
| DeepSeek V4 Pro | DeepSeek | ¥2.96 | ¥5.92 | 1049K |
| GLM-5.1 | Zhipu | ¥2.72 | ¥8.16 | 200K |
| MiMo-V2.5 | Xiaomi | ¥1.02 | ¥1.97 | 1000K |
| DeepSeek V4 Flash | DeepSeek | ¥0.95 | ¥1.90 | 1000K |
| Gemini 3 Flash | ¥1.02 | ¥4.08 | 1000K | |
| 混元 Hy3 Preview | Tencent | ¥0.41 | ¥1.22 | 256K |
| Step 3.5 Flash | StepFun | ¥0.20 | ¥0.61 | 256K |
DeepSeek V4 Pro and MiMo-V2.5 Pro are priced nearly identically at around ¥3/M input and ¥6/M output, both supporting 1M context. These are currently the cheapest million-context models available.
Open-Source/Free Models
Calling open-source models through platforms like OpenRouter or Together AI is typically much cheaper than closed-source alternatives.
| Model | Params | Input (¥/M) | Output (¥/M) | Context |
|---|---|---|---|---|
| Llama 4 Scout | — | ¥0.75 | ¥2.24 | 10000K |
| Llama 4 Maverick | — | ¥1.16 | ¥3.40 | 1000K |
| Mistral Large 3 | — | ¥2.04 | ¥6.12 | 256K |
| Phi-4 | 14B | ¥0.54 | ¥1.63 | 16K |
Llama 4 Scout supports 10M context — the longest of any open-source model. Running locally is completely free, requiring only hardware costs.
Scenario-Based Recommendations
Daily conversation, simple Q&A → GPT-5.5 Instant (¥5.10/¥20.40) or DeepSeek V4 Flash (¥0.95/¥1.90). Great value, fast response.
Complex reasoning, academic analysis → Claude Opus 4.7 (¥34/¥170) or GPT-5.5 (¥34/¥204). Expensive but the best quality.
Coding, code generation → DeepSeek V4 Pro (¥2.96/¥5.92) or MiMo-V2.5 Pro (¥3/¥6). Near-flagship coding ability at 1/10 the price.
Agents, high-frequency calls → DeepSeek V4 Flash (¥0.95/¥1.90) or Step 3.5 Flash (¥0.20/¥0.61). Prices so low they're negligible.
Long document processing (100K+ tokens) → DeepSeek V4 Pro / MiMo-V2.5 Pro (both support 1M context at ¥3/¥6).
Local deployment, data privacy → Ollama + Llama 4 Scout or Qwen3.5 open-source. One-time hardware cost, then free forever.
Notable Trends
Chinese models have a significant price advantage. For comparable output quality, domestic providers typically charge 1/5 to 1/10 of OpenAI/Anthropic. This is driven by lower inference infrastructure costs and fiercer price competition.
Cache-hit pricing is worth paying attention to. MiMo-V2.5 Pro cache hits cost just ¥0.025/M, and DeepSeek offers similar cache discounts. If your application has many repeated queries, smart caching can reduce costs by another order of magnitude.
Million-token context has become standard. In 2024, only a few models supported 128K+. By 2026, most mainstream models support 1M or more. Context length is no longer a bottleneck in model selection.




