Cohere Open-Sources North Mini Code, a 30B Agentic Coding Model That Runs on One H100

Cohere released North Mini Code on June 9, 2026, its first open-source model aimed at developers. The Apache 2.0 license means companies can modify it, deploy it internally, and build commercial products with it without a licensing agreement with Cohere.

The model is available on Hugging Face, Cohere’s API, Model Vault, and OpenRouter.

Architecture and Hardware Requirements

North Mini Code uses a Mixture-of-Experts design: 30 billion total parameters, 3 billion active per token. That activation pattern is what makes it fit on a single H100 at FP8 precision. Larger MoE models with the same total parameter count typically require multi-GPU setups; the 3B active figure keeps the memory footprint manageable.

Context window is 256,000 tokens with a 64,000-token maximum generation length. That generation ceiling is generous for agentic work, which often involves long reasoning traces or extended code outputs.

Cohere positions the model for four use cases: code generation, sub-agent orchestration, code review, and terminal operations.

Benchmark Numbers

On the Artificial Analysis Coding Index, North Mini Code scores 33.4. Cohere also reports 2.8x higher output throughput than Devstral Small 2 and a 30% inter-token latency advantage over the same model.

Those are the numbers Cohere published. Independent evaluations from the broader community will surface over time, particularly on SWE-bench and Terminal Bench, where Cohere says the model performed well in internal testing.

Why This Matters for Enterprise Teams

North Mini Code is not competing with frontier-scale models on raw capability. The target is a narrower use case: teams that need an agentic coding model they can run on their own hardware, under their own data governance rules, without routing code through an external API.

That positioning explains several choices. The Apache 2.0 license removes any legal friction around commercial deployment. The single-H100 requirement means the model fits in existing GPU clusters without dedicated infrastructure. The 256K context handles large codebases without needing to chunk files aggressively.

For teams already running Cohere’s Command models in enterprise settings, North Mini Code adds a developer-focused option to the same deployment stack.

For teams using Devstral Small from Mistral (the closest comparable open model), Cohere’s throughput and latency claims are worth testing. Both target the same “capable but efficient” niche for on-premises deployment.

What’s Missing

Cohere has not published a score on SWE-bench Verified or SWE-bench Pro, the coding benchmarks most commonly used to compare agents. Without those numbers, it is hard to place North Mini Code relative to models like Devstral Small 2 or Qwen Coder 32B.

The model also requires an H100. That is common in enterprise GPU fleets but excludes smaller teams running on A100s or consumer hardware.

Sources: