Technical Architecture and Core Breakthroughs

GPT-5.5 is OpenAI's first full-scale retraining of the base model since GPT-4.5, not a simple minor version iteration, but a reconstruction of the underlying architecture. The model was co-designed with NVIDIA GB200/GB300 NVL72 systems, achieving deep hardware-software collaborative optimization from training to deployment.

The most striking breakthrough lies in the balance between efficiency and intelligence. Despite being larger in scale and more capable, GPT-5.5 maintained the same per-token latency as GPT-5.4 in actual service. More critically, when completing the same Codex tasks, GPT-5.5 consumed significantly fewer tokens. On the NVIDIA GB200 NVL72 system, the inference cost per million tokens dropped to 1/35 of the previous generation.

Performance Surpasses Across the Board

GPT-5.5 surpasses GPT-5.4 across multiple key benchmark tests, particularly excelling in long-context tasks and agent capabilities:

Evaluation ItemGPT-5.5GPT-5.4ImprovementTest Content
Terminal-Bench 2.082.7%75.1%+7.6%Complex command-line workflows
Expert-SWE73.1%68.5%+4.6%Long-cycle engineering tasks
SWE-Bench Pro58.6%57.7%+0.9%Real GitHub issue fixes
GDPval84.9%83.0%+1.9%44 professional knowledge jobs
OSWorld-Verified78.7%75.0%+3.7%Real computer operations
Tau2-bench Telecom98.0%92.8%+5.2%Complex customer service workflows
MRCR v2 512K-1M74.0%36.6%+37.4%Long-text multi-point retrieval
Graphwalks BFS 1M45.4%9.4%+36.0%Long-context structure tracking
FrontierMath Tier 435.4%27.1%+8.3%High-difficulty math tasks
BixBench80.5%74.0%+6.5%Bioinformatics analysis
GeneBench25.0%19.0%+6.0%Gene data analysis

Data Source: Official OpenAI release and third-party evaluations

Qualitative Leap in Agent Capabilities

GPT-5.5's core design philosophy has shifted from "capability set" to "work system." Users can throw messy, multi-step complex tasks directly at the model, which autonomously plans paths, calls tools, validates results, resolves ambiguities, and continues pushing forward until completion.

In the programming field, this change is particularly evident. Early testers reported that GPT-5.5 is significantly stronger in understanding the overall structure of large codebases, able to proactively anticipate potential issues and consider testing and review requirements in advance without extra prompting. An NVIDIA engineer stated after early testing: "Losing access to GPT-5.5 feels like having a limb amputated."

Substantial Breakthrough in Long Context Capabilities

Although GPT-5.4 also claimed support for 1 million token contexts, its performance in ultra-long text retrieval was poor (only 36.6% in the 512K-1M range). GPT-5.5 raised this figure to 74.0%, an increase of 37.4 percentage points, making the 1M context window truly practical.

This breakthrough is revolutionary for scenarios requiring processing large codebases and long document analysis. The Codex environment supports a 400K context window, the API version supports 1M context (requires explicit configuration), with maximum output reaching 131,072 tokens.

New Heights in Research and Knowledge Work

In the field of scientific research, GPT-5.5 demonstrates impressive capabilities. A version of the internal model successfully proved a long-standing conjecture regarding Ramsey numbers and completed formal verification in the proof assistant Lean. Ramsey numbers are a core research object in combinatorial mathematics, and related achievements are typically extremely technically difficult and rare.

In the bioinformatics evaluation BixBench, GPT-5.5 ranked first among all published models with a score of 80.5%. Derya Unutmaz, an immunology professor at Jackson Laboratory for Genomic Medicine, used GPT-5.5 Pro to analyze a gene expression dataset containing 62 samples and nearly 28,000 genes, generating a detailed research report. He stated that this work would have originally taken the team months.

Pricing Strategy and Market Positioning

GPT-5.5's API pricing is $5 per million input tokens and $30 per million output tokens, double that of GPT-5.4 (input $2.50, output $15). GPT-5.5 Pro's API pricing is even higher, at $30 per million input tokens and $180 per million output tokens.

However, OpenAI emphasizes that due to the significant reduction in tokens required to complete the same tasks with GPT-5.5, the comprehensive usage cost may not rise significantly. Batch processing and elastic pricing enjoy half-price discounts, while priority processing is 2.5 times the standard price.

Availability and Deployment

Currently, GPT-5.5 is available to ChatGPT Plus, Pro, Business, and Enterprise users, launched in ChatGPT in the form of "GPT-5.5 Thinking". Codex supports up to a 400K context window. The API version is coming soon, with standard pricing schemes at $5 per million input tokens and $30 per million output tokens.

Safety and Governance

GPT-5.5 has undergone OpenAI's strictest safety assessment process, including preliminary framework assessments, domain-specific tests, new targeted assessments for advanced biology and cybersecurity capabilities, and robust testing with external experts. OpenAI classifies GPT-5.5's bio/chemical and cybersecurity capabilities as "High" level. While not reaching the "Critical" level, its cybersecurity capabilities show significant improvement compared to GPT-5.4.

Industry Impact and Competitive Landscape

The release of GPT-5.5 coincides with Anthropic breaking through a $1 trillion valuation in the private secondary market, while OpenAI's latest round of financing valuation at the end of March this year remains at $85.2 billion. This release is seen as a direct response by OpenAI to competitive pressure.

On the Artificial Analysis composite intelligence index list from third-party evaluation agencies, OpenAI secured the first and second places with the GPT-5.5 series, occupying four of the top six seats. However, on SWE-Bench Pro (evaluating real GitHub problem-solving capabilities), Claude Opus 4.7 still leads with a score of 64.3%, ahead of GPT-5.5's 58.6%.

Future Outlook

GPT-5.5 represents the shift of AI from auxiliary tools to collaborative partners. It is no longer just an engine for answering questions, but an agent capable of understanding complex goals, autonomously planning execution paths, and continuing to push forward until tasks are completed. With the deep application of the model in code writing, scientific research, knowledge work, and other fields, GPT-5.5 is expected to redefine human-machine collaboration work modes.

OpenAI President Greg Brockman emphasized that the core breakthrough of GPT-5.5 lies in the ability to accomplish more tasks with less guidance, with the biggest highlight being stronger autonomy when handling ambiguous problems. This characteristic makes GPT-5.5 not just a more powerful model, but a completely new work paradigm.

With the full deployment of GPT-5.5, the AI industry has officially entered the "Agent Era." Models are no longer just tools for executing instructions, but partners capable of understanding intent, planning paths, and executing autonomously. This transformation will have profound impacts on software development, scientific research, enterprise operations, and other fields.