Skip to content
Runs in:FranceMade in:China
OVH AI Endpoints (GRA)

Qwen2.5-VL-72B-Instruct

Tokonomix Editorial Team·Reviewed by Mes Kalkan··
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency73 runs
89144928084168552705-2806-15ms
Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

100
Coding
98
Multilingual
100
Reasoning
Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Qwen2.5-VL-72B-Instruct
$0.9100 per 1M input tokens
$0.9100 per 1M output tokens
≈ $0.0007 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.9100
per 1M output tokens$0.9100

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.9100

input / 1M

— stable

$0.9100

output / 1M

— stable

2026-06-142026-06-142026-06-14
Input
Output
Price change
⟳ synced weekly
Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)1600 / avg 1451
222344

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: Qwen
Section 06

Availability

Availability

No measurements yet

We haven't recorded enough API calls to show availability stats for this model. Data appears once the model starts receiving live traffic.

Section 07

Tokonomix benchmark verdicts

⚖️
Endorsed by 1 judge
Independent LLM judges evaluated this model on our weekly intelligence tests
claude-sonnet-4-593/100 · 7 runs
6 correct1 partial0 wrong86% accuracy
2026-06-14

Consistent performance maintained across all vision-language benchmarks

Qwen2.5-VL-72B-Instruct demonstrates stable performance across both benchmark windows with no measurable changes in capability metrics. The model continues to deliver strong vision-language understanding across diverse evaluation tasks. All core benchmarks remain unchanged, indicating consistent inference quality and model behavior. This stability suggests reliable production-grade performance for applications requiring visual question answering, image understanding, and multimodal reasoning tasks. The model maintains its positioning as a capable large-scale vision-language solution, with the 72 billion parameter architecture delivering the same level of accuracy and comprehension observed in the previous evaluation period. Users can expect predictable performance characteristics when deploying this model for visual AI workflows. The consistency across benchmark windows demonstrates that the service maintains stable model weights and inference configurations, providing a dependable foundation for applications requiring repeatable vision-language processing outcomes. No degradation or improvement in capabilities has been observed, making this a steady choice for teams seeking unchanging performance profiles in their multimodal AI infrastructure.

Quality

Latency p50

Test runs

0

Performance remains stable No capability degradation observed
Last automated test
Jun 15, 2026 · 08:00 UTC · Speed benchmark
P50 latency
125 ms
P95 latency
541 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·June 15, 2026