MiniMax M2.5 Just Dropped: 80% on SWE-Bench at 1/10th the Price of Claude Opus

MiniMax just released M2.5, the latest in a series of models that have been quietly climbing coding benchmarks while most of the industry’s attention stays fixed on the Anthropic-OpenAI arms race. The Shanghai-based company — now publicly traded on the Hong Kong Stock Exchange — is calling it “the world’s first production-level model designed natively for agent scenarios.”

That’s marketing language, obviously. But the numbers backing it up are worth paying attention to.

The Benchmarks

M2.5 hits 80.2% on SWE-Bench Verified, up from M2.1’s 74.0%. For context, Claude Opus 4.5 sits at 80.9%. That’s a 0.7 percentage point gap between a model that costs $15/$75 per million tokens (Opus) and one that costs roughly $0.15/$1.20 per million tokens. The price-performance ratio isn’t close.

On Multi-SWE-Bench, which tests multi-repository problem solving, M2.5 scores 51.3% (up from M2.1’s 49.4%). On BrowseComp — Anthropic’s benchmark for finding hard-to-locate information — M2.5 hits 76.3% with context management enabled.

Speed improved too. M2.5 completes SWE-Bench tasks 37% faster than M2.1, matching Claude Opus 4.6’s average completion time of 22.8 minutes per task. The Lightning variant pushes throughput to ~100 tokens per second, roughly double what most frontier models deliver.

MiniMax claims the model costs “$1 to run continuously for one hour at 100 tokens/second.” Whether or not that number holds up under real workloads, the directional message is clear: this is a model built to run agents cheaply.

What Changed From M2.1

M2.5 represents the third release in the M2 lineage in under four months (M2 in October 2025, M2.1 in December, M2.5 in February 2026). The pace is aggressive.

The technical changes worth noting:

Forge Framework: MiniMax built a custom agent-native reinforcement learning framework that achieves a claimed 40x training speedup through tree-structured sample merging and optimized async scheduling. This is the infrastructure that lets them iterate this fast.

CISPO algorithm: A new RL approach (Clipped IS-weight Policy Optimization) that clips importance sampling weights rather than token-level updates. MiniMax claims it matches DAPO performance with half the training steps, with a process reward mechanism optimized for long-context agent rollouts.

Spec-writing behavior: M2.5 has a notable tendency to plan architecture before writing code — drafting specifications, then implementing against them. This isn’t just a training artifact; it’s a deliberate design choice that improves code quality on complex, multi-file tasks.

Training scale: 200,000+ real-world environments, 10+ programming languages, full-stack coverage across web, Android, iOS, and Windows.

The internal numbers MiniMax is sharing are striking: they claim 80% of new code written at MiniMax is now M2.5-generated, and 30% of internal company tasks are completed autonomously by the model. Take those numbers with appropriate skepticism — they’re self-reported by the company selling the model — but they signal confidence in the model’s practical utility beyond benchmarks.

Two Variants

M2.5 ships in two flavors:

MiniMax-M2.5: Full-power variant, ~50 tokens/second, best benchmark performance
MiniMax-M2.5-Lightning: ~100 tokens/second, optimized for latency-sensitive applications

Both use MiniMax’s Mixture-of-Experts architecture inherited from the M2 lineage — 230B total parameters with roughly 10B active per token, keeping inference costs low despite the large parameter count.

Context window sits at approximately 200K tokens, consistent with the M2 series. MiniMax has previously described this as “expandable toward 1 million tokens,” though that expansion hasn’t shipped yet.

Pricing

API (Pay-as-You-Go)

Model	Input	Output
M2.5	~$0.15/M tokens	~$1.20/M tokens
M2.5-Lightning	$0.30/M tokens	$2.40/M tokens

For comparison: Claude Opus 4.6 costs $5/$25 per million tokens. GPT-5.3-Codex is in a similar range. M2.5 is roughly 30-50x cheaper on input and 10-20x cheaper on output than the frontier proprietary models it’s benchmarking against.

Coding Plan (Subscription)

MiniMax also sells flat-rate subscription plans designed specifically for coding tools:

Plan	Price	How It Works
Starter	$10/mo ($100/yr)	Rolling prompt pool, resets every 5 hours
Plus	$20/mo ($200/yr)	Larger prompt pool, same reset cycle
Max	$50/mo ($500/yr)	Highest prompt pool

The Coding Plan is currently documented as powered by M2.1, but M2.5 integration is expected imminently. These plans work inside Claude Code, Cline, OpenCode, and other agentic coding tools — you swap in MiniMax’s Anthropic-compatible endpoint and use your Coding Plan key.

How to Use It

M2.5 works with a surprisingly wide range of coding tools through both OpenAI-compatible and Anthropic-compatible API endpoints.

Claude Code: Point ANTHROPIC_BASE_URL at https://api.minimax.io/anthropic, set your MiniMax API key as ANTHROPIC_AUTH_TOKEN, and set ANTHROPIC_MODEL to MiniMax-M2.5.

Cursor: Override the OpenAI Base URL with https://api.minimax.io/v1.

Cline, OpenCode, Kilo Code, Roo Code, Windsurf, Codex CLI: All supported with similar API key configuration. MiniMax maintains a setup guide covering each tool.

OpenClaw: MiniMax has an OAuth integration path through OpenClaw that enables browser-based sign-in — no API key copying required. Run openclaw models auth login --provider minimax-portal --set-default and a browser window handles authorization.

The Competitive Picture

The M2 series has been on a trajectory that’s hard to ignore. Three releases in 3.5 months, each with meaningful improvements, all at price points that undercut the major labs by an order of magnitude.

The obvious comparison is to DeepSeek, which demonstrated that Chinese labs could compete on coding benchmarks at dramatically lower costs. MiniMax is now making a similar argument with an even more aggressive price-performance curve, and with better out-of-the-box integration with the Western coding tool ecosystem (Claude Code, Cursor, Cline, etc.).

The catch is the same one that applies to every model not named Claude or GPT: ecosystem maturity. MiniMax’s community is small. Documentation, while improving, is thinner than what Anthropic and OpenAI provide. The model’s behavior on edge cases and real-world production workloads hasn’t been tested at the scale that Claude and GPT see daily.

But at 80.2% SWE-Bench Verified and roughly 1/20th the cost of Opus? For developers running high-volume agentic workloads — automated code review, batch refactoring, CI/CD integration — the math is hard to ignore.

The Market Reaction

MiniMax’s Hong Kong-listed stock surged over 11% following the announcement, with intraday gains exceeding 20%. The company’s market cap now sits above 180 billion HKD.

M2.5 launched alongside a wave of Chinese model releases timed to the Lunar New Year period — including Moonshot AI’s Kimi K2.5 and Zhipu AI’s GLM-5. The competitive pressure within the Chinese AI ecosystem is producing rapid iteration cycles that the US labs haven’t matched in frequency, if not always in absolute benchmark scores.

Bottom Line

M2.5 doesn’t dethrone Claude Opus on benchmarks. It doesn’t need to. What it does is compress the gap between frontier performance and affordable agentic coding to nearly nothing. For developers and teams that are running into cost ceilings with Claude Code Max or Codex Pro subscriptions, M2.5 via a $10-50/mo Coding Plan is the most aggressive alternative currently available.

The model works inside the tools you’re already using. The API is compatible with both Anthropic and OpenAI endpoints. The benchmarks are within striking distance of the best proprietary models. And the price is a fraction.

If you’re running agents at scale, this is worth a serious evaluation.

Sources: