2026 LLM API Pricing Overview

The LLM API price war has been raging since 2024, and prices have dropped by two orders of magnitude. Here's a comprehensive comparison of current pricing from major providers to help developers make informed choices.

Prices are in ¥/million tokens (USD converted at $1=¥6.80).

Flagship Models

These are each provider's most powerful models, suited for high-quality reasoning and complex code generation.

ModelProviderInput (¥/M)Output (¥/M)Context
GPT-5.5OpenAI¥34.00¥204.001050K
Claude Opus 4.7Anthropic¥34.00¥170.001000K
Gemini 3.5 ProGoogle¥10.20¥61.201000K
Qwen3.7 MaxAlibaba¥17.00¥51.001000K
Claude Sonnet 4.6Anthropic¥20.40¥102.001000K
GPT-5.4OpenAI¥17.00¥102.001050K

Price gaps at the flagship tier are significant. GPT-5.5 output costs ¥204/M versus Gemini 3.5 Pro at ¥61.20/M — a 3x difference. If your task doesn't require the absolute highest reasoning quality, Gemini 3.5 Pro offers the best value at this tier.

Lightweight/Fast Models

Ideal for conversation, simple Q&A, code completion, and other speed-sensitive, lower-complexity tasks.

ModelProviderInput (¥/M)Output (¥/M)Context
GPT-5.5 InstantOpenAI¥5.10¥20.40922K
Kimi K2.6Moonshot¥4.96¥23.73262K
Gemini 3.5 FlashGoogle¥10.20¥61.201049K
Qwen3.6 PlusAlibaba¥2.21¥13.261000K
MiniMax-M2.7MiniMax¥1.90¥8.16205K

GPT-5.5 Instant is OpenAI's speed-oriented offering at 1/4 the price of GPT-5.5. Kimi K2.6 performs well for coding and agent tasks at a similar price point.

Ultra-Low-Cost Models

This is where the price war is fiercest. Perfect for batch processing, agent loops, data annotation, and other high-frequency scenarios.

ModelProviderInput (¥/M)Output (¥/M)Context
MiMo-V2.5 ProXiaomi¥3.00¥6.001000K
DeepSeek V4 ProDeepSeek¥2.96¥5.921049K
GLM-5.1Zhipu¥2.72¥8.16200K
MiMo-V2.5Xiaomi¥1.02¥1.971000K
DeepSeek V4 FlashDeepSeek¥0.95¥1.901000K
Gemini 3 FlashGoogle¥1.02¥4.081000K
混元 Hy3 PreviewTencent¥0.41¥1.22256K
Step 3.5 FlashStepFun¥0.20¥0.61256K

DeepSeek V4 Pro and MiMo-V2.5 Pro are priced nearly identically at around ¥3/M input and ¥6/M output, both supporting 1M context. These are currently the cheapest million-context models available.

Open-Source/Free Models

Calling open-source models through platforms like OpenRouter or Together AI is typically much cheaper than closed-source alternatives.

ModelParamsInput (¥/M)Output (¥/M)Context
Llama 4 Scout¥0.75¥2.2410000K
Llama 4 Maverick¥1.16¥3.401000K
Mistral Large 3¥2.04¥6.12256K
Phi-414B¥0.54¥1.6316K

Llama 4 Scout supports 10M context — the longest of any open-source model. Running locally is completely free, requiring only hardware costs.

Scenario-Based Recommendations

Daily conversation, simple Q&A → GPT-5.5 Instant (¥5.10/¥20.40) or DeepSeek V4 Flash (¥0.95/¥1.90). Great value, fast response.

Complex reasoning, academic analysis → Claude Opus 4.7 (¥34/¥170) or GPT-5.5 (¥34/¥204). Expensive but the best quality.

Coding, code generation → DeepSeek V4 Pro (¥2.96/¥5.92) or MiMo-V2.5 Pro (¥3/¥6). Near-flagship coding ability at 1/10 the price.

Agents, high-frequency calls → DeepSeek V4 Flash (¥0.95/¥1.90) or Step 3.5 Flash (¥0.20/¥0.61). Prices so low they're negligible.

Long document processing (100K+ tokens) → DeepSeek V4 Pro / MiMo-V2.5 Pro (both support 1M context at ¥3/¥6).

Local deployment, data privacy → Ollama + Llama 4 Scout or Qwen3.5 open-source. One-time hardware cost, then free forever.

Chinese models have a significant price advantage. For comparable output quality, domestic providers typically charge 1/5 to 1/10 of OpenAI/Anthropic. This is driven by lower inference infrastructure costs and fiercer price competition.

Cache-hit pricing is worth paying attention to. MiMo-V2.5 Pro cache hits cost just ¥0.025/M, and DeepSeek offers similar cache discounts. If your application has many repeated queries, smart caching can reduce costs by another order of magnitude.

Million-token context has become standard. In 2024, only a few models supported 128K+. By 2026, most mainstream models support 1M or more. Context length is no longer a bottleneck in model selection.