Skip to content
Runs in:FranceMade in:China
OVH AI Endpoints (GRA)

Qwen3.5-397B-A17B

Tokonomix Editorial Team·Reviewed by Mes Kalkan··
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency53 runs
16788715758236293150005-2806-10ms
Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

100
Coding
45
Creative
1
Factual
30
Multilingual
Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Qwen3.5-397B-A17B
$0.7100 per 1M input tokens
$4.25 per 1M output tokens
≈ $0.0013 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.7100
per 1M output tokens$4.25
No pricing history yet — will populate after the first metadata sync detects a price change.
Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)760 / avg 1195
122235

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: Qwen
Section 06

Tokonomix benchmark verdicts

⚖️
Endorsed by 1 judge
Independent LLM judges evaluated this model on our weekly intelligence tests
claude-sonnet-4-535/100 · 7 runs
2 correct0 partial5 wrong29% accuracy
2026-05-31

Qwen3.5-397B-A17B establishes baseline with strong creative performance

This first benchmark window establishes baseline performance for Qwen3.5-397B-A17B deployed through OVH AI Endpoints in the GRA region. The model demonstrates particularly strong creative writing capabilities, achieving 9.0 out of 10 in creative tasks, indicating robust narrative generation and imaginative content production. Coding performance is solid at 7.5, showing competence in programming tasks though with room for optimization. Mathematical reasoning scores 7.0, representing adequate performance for standard computational problems. The model handles instruction following reliably at 7.0, meeting basic compliance requirements. Response coherence is maintained at 7.0, ensuring outputs remain logical and well-structured. Overall performance across all categories averages a respectable level for a model of this class. Users should expect best results when leveraging the model for creative content generation, storytelling, and narrative tasks. For production code generation and complex mathematical proofs, outputs may require additional validation. This baseline provides a reference point for tracking future performance trends and model updates.

Quality

Latency p50

Test runs

0

Strong creative writing at 9.0 Solid coding performance at 7.5 Math reasoning needs improvement Baseline established across all metrics
Last automated test
Jun 10, 2026 · 02:00 UTC · Speed benchmark
P50 latency
263 ms
P95 latency
279 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·June 10, 2026