Tier B — Production

Runs in:FranceMade in:China

Qwen3-Coder-30B-A3B-Instruct

Tier B — Production

Tokonomix Editorial Team·Reviewed by Mes Kalkan·Published May 27, 2026·Last reviewed July 30, 2026

Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency102 runs

Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

Creative

Factual

100

Multilingual

Reasoning

Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰

API rates — Qwen3-Coder-30B-A3B-Instruct

$0.0700 per 1M input tokens

$0.2600 per 1M output tokens

≈ <$0.0001 per typical conversation (800 tokens)

Input vs output price (per 1M tokens)

per 1M input tokens$0.0700

per 1M output tokens$0.2600

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.0700

input / 1M

— stable

$0.2600

output / 1M

— stable

2026-06-142026-06-282026-07-26

Input

Output

Price change

⟳ synced weekly

Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)2174 / avg 1432

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: Qwen

Section 06

Availability

No measurements yet

We haven't recorded enough API calls to show availability stats for this model. Data appears once the model starts receiving live traffic.

Section 07

Tokonomix benchmark verdicts

⚖️

Endorsed by 2 judges

Independent LLM judges evaluated this model on our weekly intelligence tests

cohere/command-a100/100 · 1 runs

1 correct0 partial0 wrong100% accuracy

claude-sonnet-4-592/100 · 47 runs

41 correct2 partial4 wrong87% accuracy

● 2026-07-26

Quality drops 9.8 points to 86.5 as category mix shifts from coding

Qwen3-Coder-30B-A3B-Instruct experienced a notable quality decline this window, falling from 96.3 to 86.5 overall. The most significant change is a shift in tested categories, with coding tests absent from the current window while new categories emerged. Multilingual performance remains the model's strongest area, maintaining exceptional scores at 100 compared to 99 previously. Creative work held relatively steady, moving from 90 to 88. However, the newly tested reasoning category scored 75, and factual performance came in at 83, both pulling the overall average down. The absence of coding tests is particularly notable given this model's specialized positioning and its perfect 100 coding score in the previous window. On the positive side, latency improved by 16 percent, dropping from 4655ms to 3913ms at median, making the model more responsive for interactive use cases. With only 5 test runs in each window, these results should be considered preliminary. Users should note that while the model continues to excel at multilingual tasks and maintains decent creative capabilities, the current test mix suggests more variability in reasoning and factual domains than previously observed.

Quality

86.5

Latency p50

3,913 ms

Test runs

✗ Quality dropped 9.8 points✓ Latency improved 16%✓ Multilingual maintains perfect score✗ No coding tests this window

Last automated test

Jul 30, 2026 · 14:04 UTC · Speed benchmark

P50 latency

92 ms

P95 latency

432 ms

Errors

0 / 6 runs

Last reviewed by Tokonomix Team·July 30, 2026