Tier B — Productie

Draait in:FranceGemaakt in:China

Qwen3-32B

Tier B — Productie

Tokonomix-redactie·Gecontroleerd door Mes Kalkan·Gepubliceerd 27 mei 2026·Laatst gecontroleerd 30 juli 2026

Sectie 01

Snelheidsanalyse

Latency gemeten over alle benchmark-runs. P50 (mediaan) en P95 (95e percentiel) geven een realistisch beeld van de responssnelheid onder normale en piekbelasting.

P50 latency (mediaan)P95 latency101 runs

Sectie 02

Kwaliteitsscores

Evaluatieresultaten van judge-model beoordelingen over diverse taakcategorieën. Scores weerspiegelen coherentie, accuratesse en instructieopvolging.

Creatief

Feitelijk

Meertaligheid

Redeneren

Sectie 03

Prijsgeschiedenis

Directe provider-tarieven per miljoen tokens, plus een typische gespreks-kostschatting.

💰

API-tarieven — Qwen3-32B

$0.0800 per 1M input-tokens

$0.2300 per 1M output-tokens

≈ <$0.0001 per typisch gesprek (800 tokens)

Input vs output prijs (per 1M tokens)

per 1M input-tokens$0.0800

per 1M output-tokens$0.2300

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.0800

input / 1M

— stable

$0.2300

output / 1M

— stable

2026-06-142026-07-122026-07-26

Input

Output

Price change

⟳ synced weekly

Sectie 04

Tokens per seconde

Doorvoersnelheid in tokens per seconde, afgeleid uit gemeten P50-latency. Hogere waarden zijn beter; fluctuaties weerspiegelen serverbelasting bij de provider.

Doorvoer (tokens / s)421 / avg 420

Geschat uit P50-latency × 200 output-tokens — het absolute getal hangt af van deze aanname; de trend is wat telt.

Sectie 05

Mogelijkheden

ownedBy: Qwen

Sectie 06

Beschikbaarheid

Hoe vaak dit model antwoordt als we het aanroepen — gemeten over echte API-aanvragen en live-tests in de afgelopen 30 dagen. Dit staat los van kwaliteit: deze cijfers laten alleen zien of het model reageert, niet hoe goed het antwoord is.

Afgelopen 7 dagen

—

Afgelopen 30 dagen

100.0%

n=33

Mediane responstijd

145,961ms

n=33

Gebaseerd op 413 metingen in de afgelopen 30 dagen.

Technische details

Alleen echte API-aanroepen en live-testverzoeken tellen mee — interne probes en benchmarkruns zijn uitgesloten.

Aanroepen met een eigen API-sleutel (BYOK) zijn uitgesloten: die fouten zijn sleutelspecifiek en geen teken van modelneergang.

Mislukte aanroepen worden NIET meegeteld in kwaliteitsscores — kwaliteit wordt gemeten op geslaagde responses. Beschikbaarheid en kwaliteit zijn onafhankelijke signalen.

Mediane responstijd (p50) over geslaagde aanroepen met een vastgelegde duur. Uitschieters trekken de mediaan minder dan het gemiddelde.

Totaal aanroepen (30d)

OK-reacties (30d)

Totaal aanroepen (7d)

OK-reacties (7d)

Sectie 07

Tokonomix benchmark-oordelen

⚖️

Endorsed by 2 judges

Independent LLM judges evaluated this model on our weekly intelligence tests

cohere/command-a95/100 · 1 runs

1 correct0 partial0 wrong100% accuracy

claude-sonnet-4-584/100 · 47 runs

34 correct9 partial4 wrong72% accuracy

● 2026-07-26

Qwen3-32B shows 34% latency gain but factual score plummets to 35

The current benchmark window reveals a mixed performance picture for Qwen3-32B deployed on OVH AI Endpoints. While latency has improved substantially with p50 dropping from 24595ms to 16206ms, representing a 34% speed increase, the overall quality score has declined slightly from 73.4 to 72.3. The most concerning development is the dramatic collapse in factual performance, now scoring just 35 compared to the previous window where factual capabilities weren't measured but coding achieved 94. This suggests a significant regression in knowledge accuracy and reliability. On the positive side, multilingual capabilities have strengthened from 86 to 95, and reasoning performance stands strong at 83. Creative writing has rebounded impressively from 40 to 76, reversing the sharp decline noted in the previous period. The model appears to have shifted its strengths, excelling at multilingual tasks and creative generation while struggling with factual accuracy. Users requiring precise factual responses should exercise caution, while those focused on creative multilingual applications may find the current configuration more suitable. The latency improvements make the service more responsive overall, but the factual performance gap represents a critical weakness for general-purpose deployments.

Quality

72.3

Latency p50

16,206 ms

Test runs

✓ Latency improved 34%✗ Factual score dropped to 35✓ Multilingual performance up to 95✓ Creative rebounds from 40 to 76

Laatste automatische test

30 jul 2026 · 08:04 UTC · Snelheidstest

P50 latency

475 ms

P95 latency

620 ms

Fouten

0 / 6 runs

Laatst beoordeeld door Tokonomix-team·30 juli 2026