Skip to content
Runs in:FranceMade in:China
Tokonomix Editorial Team·Reviewed by Mes Kalkan··
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency73 runs
362107417852497320805-2806-15ms
Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

95
Coding
73
Multilingual
88
Reasoning
Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Qwen3-32B
$0.0800 per 1M input tokens
$0.2300 per 1M output tokens
≈ <$0.0001 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.0800
per 1M output tokens$0.2300

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.0800

input / 1M

— stable

$0.2300

output / 1M

— stable

2026-06-142026-06-142026-06-14
Input
Output
Price change
⟳ synced weekly
Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)471 / avg 452
546291

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: Qwen
Section 06

Availability

Availability

No measurements yet

We haven't recorded enough API calls to show availability stats for this model. Data appears once the model starts receiving live traffic.

Section 07

Tokonomix benchmark verdicts

⚖️
Endorsed by 1 judge
Independent LLM judges evaluated this model on our weekly intelligence tests
claude-sonnet-4-587/100 · 7 runs
5 correct2 partial0 wrong71% accuracy
2026-06-14

Qwen3-32B maintains consistent performance with configuration update

Qwen3-32B by OVH AI Endpoints continues to demonstrate stable performance characteristics following a configuration update. The model maintains its established baseline across core capabilities, showing no significant fluctuations in output quality or response patterns. Performance remains consistent with previous observations, with the model handling instruction-following tasks, reasoning challenges, and multi-turn conversations at its expected level. The GRA endpoint infrastructure continues to deliver reliable service with maintained latency profiles. Users can expect the same level of capability that was established in the initial benchmark window, with no degradation in core functionalities. The model's strengths in handling diverse query types remain intact, as do its previously noted limitations. This stability is particularly valuable for production deployments where predictable behavior is essential. Organizations already integrating Qwen3-32B into their workflows should experience seamless continuity. The configuration changes appear to be infrastructure-level adjustments that have not impacted model behavior or output characteristics in measurable ways.

Quality

Latency p50

Test runs

0

Performance stability maintained Configuration updated successfully
Last automated test
Jun 15, 2026 · 08:00 UTC · Speed benchmark
P50 latency
425 ms
P95 latency
447 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·June 15, 2026