Moonshot AI logo, the company behind the Kimi K2 model family Image: Moonshot AI (via Hugging Face)
by Michael Joiner

Kimi K2.7-Code Is Out: Better Benchmarks, 30% Fewer Reasoning Tokens, No Independent Results Yet

Moonshot AI released Kimi K2.7-Code on June 12: a trillion-parameter open-source coding model that cuts reasoning token usage by 30% compared to K2.6. Every benchmark number comes from Moonshot's own proprietary evals.

Share

Moonshot AI released Kimi K2.7-Code on June 12, 2026. The model is open-weight under a Modified MIT license, weights are on Hugging Face, and the API is live at $0.95 per million input tokens and $4.00 per million output tokens.

K2.7-Code is a coding-focused successor to K2.6. The architecture is identical, a trillion-parameter Mixture-of-Experts with 32 billion parameters activated per token, 384 experts in total, and a 256K-token context window. The main claims are 30% lower reasoning-token usage than K2.6, and meaningful improvements on Moonshot’s proprietary coding benchmarks.

What Moonshot says changed

Moonshot published six benchmark comparisons, all from its own evaluation suites:

  • Kimi Code Bench v2: 50.9 on K2.6, 62.0 on K2.7 (+21.8%)
  • Program Bench: +11.0% over K2.6
  • MLS Bench Lite: +31.5% over K2.6
  • MCP Mark Verified: 81.1

The 30% reasoning token reduction is the more practical claim for anyone running agentic workflows. If accurate, it means lower inference costs for the same output quality. K2.7-Code also scores 81.1 on MCP Mark Verified, which tests correct tool invocation through the Model Context Protocol. That number matters if your agents call tools rather than just producing text.

What we don’t know yet

None of the published benchmarks are third-party results. As of the release date, K2.7-Code has no independent scores on SWE-Bench Verified, SWE-Bench Pro, Terminal-Bench 2.0, LiveCodeBench, or GPQA Diamond. Those are the evaluations that most practitioners use to compare models across labs.

K2.6 scored 80.2% on SWE-Bench Verified when it released in April, which was competitive with frontier models at the time. Whether K2.7-Code improves on that isn’t known yet.

VentureBeat reported that practitioners are skeptical of the benchmark framing. That skepticism is reasonable. Proprietary evals are chosen and scored by the model’s creators, and “vs. K2.6” comparisons don’t tell you where the model sits relative to Claude, GPT, or other open-weight alternatives.

How to use it

The model is available through three channels:

  • Kimi API: Same endpoint as K2.6, updated model ID. Pricing is $0.95 input / $4.00 output / $0.19 cached per million tokens.
  • Kimi Code CLI: The open-source CLI at github.com/MoonshotAI/kimi-cli supports K2.7-Code.
  • Self-hosted: Weights on Hugging Face at moonshotai/Kimi-K2.7-Code. At roughly 595 GB, you need real infrastructure to run it locally.

One constraint worth knowing: thinking mode is mandatory. You cannot disable it and the sampling parameters are fixed server-side, so you have less control over output behavior than on some other APIs.

Context

K2.7-Code is Moonshot’s third major coding model update this year. K2.5 launched in January with Agent Swarm, K2.6 in April with SWE-Bench improvements. Each release has moved the numbers, but independent verification has lagged behind the announcements. That’s not unique to Moonshot, but it is something to account for when evaluating whether to switch.

If you are already running K2.6 in production, K2.7-Code is worth testing. The 30% reasoning token claim alone is meaningful if it holds up. If you’re evaluating from scratch, wait for third-party numbers before committing.


Sources: Hugging Face model card, MarkTechPost, Crypto Briefing

Share

The Weekly Diff

One email a week: every AI coding tool price change, plan restructure, and major release we verified, with sources. No filler.

Free. Unsubscribe anytime.