Tier A — Frontier

Draait in:FranceGemaakt in:China

Qwen3.5-397B-A17B

Tier A — Frontier

Tokonomix-redactie·Gecontroleerd door Mes Kalkan·Gepubliceerd 27 mei 2026·Laatst gecontroleerd 25 juli 2026

Sectie 01

Snelheidsanalyse

Latency gemeten over alle benchmark-runs. P50 (mediaan) en P95 (95e percentiel) geven een realistisch beeld van de responssnelheid onder normale en piekbelasting.

P50 latency (mediaan)P95 latency105 runs

Sectie 02

Kwaliteitsscores

Evaluatieresultaten van judge-model beoordelingen over diverse taakcategorieën. Scores weerspiegelen coherentie, accuratesse en instructieopvolging.

100

Code generatie

100

Meertaligheid

Creatief

Sectie 03

Prijsgeschiedenis

Directe provider-tarieven per miljoen tokens, plus een typische gespreks-kostschatting.

💰

API-tarieven — Qwen3.5-397B-A17B

$0.7100 per 1M input-tokens

$4.25 per 1M output-tokens

≈ $0.0013 per typisch gesprek (800 tokens)

Input vs output prijs (per 1M tokens)

per 1M input-tokens$0.7100

per 1M output-tokens$4.25

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.7100

input / 1M

— stable

$4.25

output / 1M

— stable

2026-06-142026-07-052026-07-19

Input

Output

Price change

⟳ synced weekly

Sectie 04

Tokens per seconde

Doorvoersnelheid in tokens per seconde, afgeleid uit gemeten P50-latency. Hogere waarden zijn beter; fluctuaties weerspiegelen serverbelasting bij de provider.

Doorvoer (tokens / s)1124 / avg 877

Geschat uit P50-latency × 200 output-tokens — het absolute getal hangt af van deze aanname; de trend is wat telt.

Sectie 05

Mogelijkheden

ownedBy: Qwen

Sectie 06

Beschikbaarheid

Hoe vaak dit model antwoordt als we het aanroepen — gemeten over echte API-aanvragen en live-tests in de afgelopen 30 dagen. Dit staat los van kwaliteit: deze cijfers laten alleen zien of het model reageert, niet hoe goed het antwoord is.

Afgelopen 7 dagen

—

Afgelopen 30 dagen

100.0%

n=15

Mediane responstijd

1,177ms

n=15

Gebaseerd op 395 metingen in de afgelopen 30 dagen.

Technische details

Alleen echte API-aanroepen en live-testverzoeken tellen mee — interne probes en benchmarkruns zijn uitgesloten.

Aanroepen met een eigen API-sleutel (BYOK) zijn uitgesloten: die fouten zijn sleutelspecifiek en geen teken van modelneergang.

Mislukte aanroepen worden NIET meegeteld in kwaliteitsscores — kwaliteit wordt gemeten op geslaagde responses. Beschikbaarheid en kwaliteit zijn onafhankelijke signalen.

Mediane responstijd (p50) over geslaagde aanroepen met een vastgelegde duur. Uitschieters trekken de mediaan minder dan het gemiddelde.

Totaal aanroepen (30d)

OK-reacties (30d)

Totaal aanroepen (7d)

OK-reacties (7d)

Sectie 07

Tokonomix benchmark-oordelen

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-541/100 · 42 runs

14 correct1 partial27 wrong33% accuracy

● 2026-07-19

Qwen3.5-397B-A17B jumps to 81.7/100 with creative gains, reasoning still absent

Qwen3.5-397B-A17B demonstrates a remarkable recovery with an overall quality score of 81.7, up 39.2 points from the previous window's 42.4. The model now achieves perfect scores in both coding and multilingual categories at 100 each, maintaining its strong coding performance while dramatically improving multilingual capabilities from 33. The most significant shift appears in creative tasks, which climbed from zero in the implied previous state to 45, though this remains the weakest category. However, reasoning capabilities remain completely absent with no score recorded in this window, consistent with the zero score from the previous period. Latency has increased modestly from 4725ms to 5235ms at the median, representing an approximately 11% slowdown. The test methodology remains consistent with 5 runs in each window. Users requiring strong coding and multilingual support will find this model highly capable, but those needing creative writing or reasoning tasks should be aware of the model's limitations in these areas. The dramatic quality improvement suggests either infrastructure enhancements or model configuration changes at the OVH GRA endpoint.

Quality

81.7

Latency p50

5,235 ms

Test runs

✓ Quality jumped 39.2 points✓ Multilingual improved to perfect 100✓ Creative emerged at 45✗ Latency increased 11%

Laatste automatische test

25 jul 2026 · 02:01 UTC · Snelheidstest

P50 latency

178 ms

P95 latency

236 ms

Fouten

0 / 6 runs

Laatst beoordeeld door Tokonomix-team·25 juli 2026