Tier A — Frontier

Runs in:FranceMade in:China

Qwen3.5-397B-A17B

Tier A — Frontier

Tokonomix Editorial Team·Reviewed by Mes Kalkan·Published May 27, 2026·Last reviewed July 25, 2026

Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency105 runs

Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

100

Coding

100

Multilingual

Creative

Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰

API rates — Qwen3.5-397B-A17B

$0.7100 per 1M input tokens

$4.25 per 1M output tokens

≈ $0.0013 per typical conversation (800 tokens)

Input vs output price (per 1M tokens)

per 1M input tokens$0.7100

per 1M output tokens$4.25

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.7100

input / 1M

— stable

$4.25

output / 1M

— stable

2026-06-142026-07-052026-07-19

Input

Output

Price change

⟳ synced weekly

Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)858 / avg 875

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: Qwen

Section 06

Availability

How often this model answers when we call it — measured across real API requests and live tests over the last 30 days. This is separate from quality: these numbers only tell you whether the model responds, not how good the answer is.

Last 7 days

—

Last 30 days

100.0%

n=15

Median response time

1,177ms

n=15

Based on 395 measurements over the last 30 days.

Technical details

Only live API calls and live-test requests count — internal probes and benchmark runs are excluded.

Calls with a custom API key (BYOK) are excluded: those failures are key-specific, not a sign of model downtime.

Failed calls are NOT included in quality scores — quality is measured on successful responses only. Availability and quality are independent signals.

Median response time (p50) across successful calls with a recorded duration. Outliers (very slow or very fast calls) pull the median less than the average.

Total calls (30d)

OK responses (30d)

Total calls (7d)

OK responses (7d)

Section 07

Tokonomix benchmark verdicts

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-541/100 · 42 runs

14 correct1 partial27 wrong33% accuracy

● 2026-07-19

Qwen3.5-397B-A17B jumps to 81.7/100 with creative gains, reasoning still absent

Qwen3.5-397B-A17B demonstrates a remarkable recovery with an overall quality score of 81.7, up 39.2 points from the previous window's 42.4. The model now achieves perfect scores in both coding and multilingual categories at 100 each, maintaining its strong coding performance while dramatically improving multilingual capabilities from 33. The most significant shift appears in creative tasks, which climbed from zero in the implied previous state to 45, though this remains the weakest category. However, reasoning capabilities remain completely absent with no score recorded in this window, consistent with the zero score from the previous period. Latency has increased modestly from 4725ms to 5235ms at the median, representing an approximately 11% slowdown. The test methodology remains consistent with 5 runs in each window. Users requiring strong coding and multilingual support will find this model highly capable, but those needing creative writing or reasoning tasks should be aware of the model's limitations in these areas. The dramatic quality improvement suggests either infrastructure enhancements or model configuration changes at the OVH GRA endpoint.

Quality

81.7

Latency p50

5,235 ms

Test runs

✓ Quality jumped 39.2 points✓ Multilingual improved to perfect 100✓ Creative emerged at 45✗ Latency increased 11%

Last automated test

Jul 25, 2026 · 08:03 UTC · Speed benchmark

P50 latency

233 ms

P95 latency

287 ms

Errors

0 / 6 runs

Last reviewed by Tokonomix Team·July 25, 2026