OpenAI Launches GPT-5.5, Now Powering Codex With Stronger Agentic Coding

OpenAI released GPT-5.5 on April 23, rolling it out to paid subscribers in ChatGPT and Codex. The company says it’s their best model at agentic coding tasks, and the benchmarks back that up more than most model releases do.

On Terminal-Bench 2.0, which measures how well a model can use command-line tools and complete real engineering workflows, GPT-5.5 scored 82.7%. That’s a 7.6 percentage point improvement over GPT-5.4’s 75.1% on the same benchmark. On SWE-Bench Pro, the standard test for GitHub issue resolution, GPT-5.5 hit 58.6%. OpenAI’s internal Expert-SWE eval put the model at 73.1%, up from 68.5% for the previous version.

The gains in agentic workflows are what OpenAI is emphasizing the most. Early testing shows the model holds context better across large codebases, reasons through ambiguous failures more reliably, and carries changes consistently through the surrounding code rather than fixing the immediate issue while breaking something nearby.

What’s Different in Practice

The efficiency story is notable. GPT-5.5 completes the same Codex tasks using fewer tokens than GPT-5.4, while matching it on per-token latency. That means faster responses with less context burned per task, which matters significantly in long agentic sessions where context costs compound quickly.

OpenAI is also releasing a higher-tier version called GPT-5.5 Pro, positioned for especially complex tasks. The standard model is priced at $5 per million input tokens and $30 per million output tokens, roughly double GPT-5.4’s pricing. Whether the performance gains justify the cost increase depends on workload: for agentic tasks where you’re burning large context windows, the efficiency improvements could offset some of the per-token price jump.

Codex Is Scaling Fast

Codex now has more than 4 million weekly active developers, up from 3 million in early April. OpenAI says over 85% of its own employees across engineering, finance, communications, marketing, data science, and product use Codex weekly. The infrastructure runs on NVIDIA GB200 NVL72 rack-scale systems.

Recent Codex additions including browser-based UI interaction, an automatic review agent that flags risk levels before running approval prompts, and multi-environment support for parallel workspaces are now backed by the stronger model. Those features were announced earlier this month; GPT-5.5 is the upgrade running underneath them.

Availability

GPT-5.5 is rolling out to ChatGPT Plus, Pro, Business, and Enterprise subscribers, as well as Codex. API access is available through the standard endpoint at the new pricing. GPT-5.2 remains the recommended default for most Codex tasks when GPT-5.5 doesn’t appear in your model picker yet, as the rollout is gradual.

The full announcement with benchmark comparisons is at openai.com.

Sources: OpenAI Blog, TechCrunch, Interesting Engineering, NVIDIA Blog, developer-tech.com