OpenCode Models Comparison

The goal of this article is to answer the question: “Which model should I use for best cost-efficiency?”. The reality of today’s AI landscape which we already explored is that some models are only good because they are highly subsidized behind opaque subscriptions. But for you and me using token-based harnesses, like OpenCode, there are real decisions to be made. Below is a simple heuristic to make these decisions.

Choosing a Model for a Task

Calculating model efficiency is difficult. There are many parameters, like difficulty of the task, token-efficiency, token cost, caching rate, providers, etc. In the model below, I avoided collapsing cost and capability into a single lossy index, instead the practical approach is a filter-then-rank decision procedure:

Threshold: Set minimum capability needed for the task.
Filter: Filter-out under-qualified models.
Rank: Sort by Effective Cost ascending.
Pick: Cheapest model is our best value.

What is Effective Cost?

Effective Cost is a blended per-million-token rate assuming a default 80% cache hit rate and a 3:1 input:output token ratio (75% input, 25% output). Formula: 0.75 × (0.20 × Input + 0.80 × Cached Read) + 0.25 × Output. Cache write pricing is excluded as it’s a one-time cost per unique prompt prefix, not a per-request cost.

Provider: OpenCode Zen pricing as of July 2026. Index scores sourced from Artificial Analysis Intelligence Index v4.1, Coding Index, and Agentic Index.

AA Cost Ratio is the model’s total cost to evaluate on the AA Intelligence Index ÷ Claude Opus 4.8’s total cost ($3,752.55). Lower is better. Baseline (Claude Opus 4.8) set to 1.000.

Free models and Claude Fable 5 are excluded. Some models like Gemini 3.1 and GPT 5.x have similar scores disregarding the context window tier.

Use the sliders below to set minimum Intelligence, Coding, and Agentic scores. The table updates instantly and ranks surviving models by Effective Cost ascending.

Loading models…

How to calibrate your own threshold

But before you do that, read below about Cost Staircase & Capability Tiers.

Task type	Primary index	Typical threshold
Simple classification, keyword extraction	Intelligence	30–35
Structured data extraction, summarization	Intelligence	35–40
RAG response generation, Q&A	Intelligence	40–50
Complex reasoning, deal analysis, proposals	Intelligence	50–55
Top-tier research, ambiguous problem-solving	Intelligence	55+
Simple codegen (boilerplate, scripts, auto-completion, inline assist)	Coding	45–55
Feature-level codegen, bug fixing	Coding	55–65
Complex codegen (multi-file, architecture)	Coding	65+
Tool-calling, single-step agents	Agentic	25–35
Multi-step agents, error recovery	Agentic	35–45
Autonomous long-running agents	Agentic	45+

Some observations

Claude Sonnet 5

Claude Sonnet 5 is exceptionally expensive. It consumed more tokens than any other model on the AA benchmark which increased its cost to a 1.6 ratio. It is even more expensive than Claude Fable 5 but less efficient. I don’t really know yet how to justify this. Maybe it is optimized for the Anthropic toolchain only.

Cost Staircase & Capability Tiers

When MiniMax M3 enters the table it suddenly creates a huge jump in cost. There are different classes or levels of models: from DeepSeek Flash to Minimax M3 the cost is ~3x, then Minimax M3 to Kimi K2.7 and GLM-5.2 the jump is another 3x. What we’re seeing is a cost staircase. The table looks like 23 individual models, but it’s really about 4–5 capability tiers with ~3x price steps between them. The jumps happen at specific coding thresholds where the cheap model falls off. These thresholds are where the model selection matters.

Coding threshold	Cheapest survivor	Cost	Jump from previous
< 57	DeepSeek V4 Flash	$0.11	— (baseline)
57–59	MiniMax M3	$0.38	3.5×
60–71	Kimi K2.7 / GLM-5.2	$1.22	3.2×
> 71	Claude Sonnet 5	$2.92	2.4×

We can see similar thresholds when using the intelligence index but the gap in intelligence is less obvious. A 2.5x jump in cost can separate two models with only a difference of 1 or 2 intelligence points.

Three critical cliffs:

40 → 42 (+2 pts): 3.5×: DeepSeek V4 Flash drops off, MiniMax M3 takes over
44 → 46 (+2 pts): 4.0×: MiniMax M3 drops off, GLM 5.2 takes over.
53 → 54 (+1 pt): 2.5×: Claude Sonnet 5 drops off, Claude Opus 4.7 takes over.

The last step is interesting. A single intelligence point costs you 2.5× more per token.