Cursor Composer 2.5: Challenging Frontier Models at 1/10 the Cost
On May 18, Cursor released Composer 2.5 — the next generation model powering their AI coding assistant. The upgrade arrives soon after Composer 2 and brings significant improvements.
Cursor partnered with SpaceXAI (xAI's new brand) to train on Colossus 2 — a cluster with a million H100-equivalents. But Composer 2.5 itself is based on Moonshot's open-source Kimi K2.5 checkpoint, not trained from scratch.
Model performance comparison
Cursor claims Composer 2.5 delivers near-frontier performance at 1/10 the cost. Here's how it stacks up:
Benchmark comparison
| Model | Relative Cost | Coding | Long Tasks |
|---|---|---|---|
| Composer 2.5 | ★ | ★★★★ | ★★★★ |
| Claude Opus 4.7 | ★★★★★ | ★★★★★ | ★★★★★ |
| GPT-4o | ★★★ | ★★★★ | ★★★ |
| Kimi K2.5 (base) | ★★ | ★★★★ | ★★★ |
But community feedback tells a different story. HN users pointed out Composer 2 had similar benchmark claims — and real-world performance fell short.
Pricing
- Composer 2.5: Included in Cursor Pro $20/month
- Claude Opus 4.7: ~$15/$75 per 1M tokens (API)
- Gemini 3.5 Flash: $0.75/$4.50 per 1M tokens
Technical highlights
Targeted RL with textual feedback. Rather than scoring entire rollouts, Composer 2.5 inserts hints at the exact point where the model messed up. This gives localized training signals for specific behaviors.
25x more synthetic tasks. As the model improves, existing training problems become too easy. They auto-generate harder tasks through feature deletion, code refactoring, and other techniques.
Behavioral optimization. Beyond intelligence, they improved communication style and effort calibration — dimensions not captured by benchmarks but crucial for daily use.
Next step: training from scratch
Cursor revealed they're training a significantly larger model from scratch on Colossus 2 with 10x more compute. This signals they're not content with fine-tuning open-source checkpoints — they want their own flagship model.
Reception
HN 277 points, 39 comments. Feedback was split:
- Supporters see it as the best value option for daily high-frequency use
- Skeptics were burned by Composer 2's "great benchmarks, mediocre reality" and remain cautious
- Some questioned the Opus 4.7 comparison — why not Sonnet?
Overall, Composer 2.5 strikes a solid balance between cost and capability. For Cursor users, this upgrade is worth trying. But don't expect it to fully replace Claude Opus or GPT-4o for core development — that might have to wait for their custom model.




