Skip to content
Runs in:FranceMade in:France
OVH AI Endpoints (GRA)

Mistral-Small-3.2-24B-Instruct-2506

Tokonomix Editorial Team·Reviewed by Mes Kalkan··
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency73 runs
642561505775541005005-2806-15ms
Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

100
Coding
99
Multilingual
100
Reasoning
Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Mistral-Small-3.2-24B-Instruct-2506
$0.0900 per 1M input tokens
$0.2800 per 1M output tokens
≈ $0.0001 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.0900
per 1M output tokens$0.2800

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.0900

input / 1M

— stable

$0.2800

output / 1M

— stable

2026-06-142026-06-142026-06-14
Input
Output
Price change
⟳ synced weekly
Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)1667 / avg 1721
3056461

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: mistralai
Section 06

Availability

Availability

How often this model answers when we call it — measured across real API requests and live tests over the last 30 days. This is separate from quality: these numbers only tell you whether the model responds, not how good the answer is.

Last 7 days

100.0%

n=8

Last 30 days

100.0%

n=8

Median response time

6,342ms

n=8

Based on 76 measurements over the last 30 days.

Technical details

Only live API calls and live-test requests count — internal probes and benchmark runs are excluded.

Calls with a custom API key (BYOK) are excluded: those failures are key-specific, not a sign of model downtime.

Failed calls are NOT included in quality scores — quality is measured on successful responses only. Availability and quality are independent signals.

Median response time (p50) across successful calls with a recorded duration. Outliers (very slow or very fast calls) pull the median less than the average.

Total calls (30d)

8

OK responses (30d)

8

Total calls (7d)

8

OK responses (7d)

8

Image quality control pilot (2026-06-10)

Recall

9.4%

n=300

False-alarm rate

12.1%

n=300

Section 07

Tokonomix benchmark verdicts

⚖️
Endorsed by 1 judge
Independent LLM judges evaluated this model on our weekly intelligence tests
claude-sonnet-4-588/100 · 8 runs
6 correct2 partial0 wrong75% accuracy
2026-06-14

Stable performance maintained with expanded category testing

Mistral-Small-3.2-24B-Instruct-2506 continues to demonstrate exceptional performance in this benchmark window, maintaining its perfect quality score of 100.0 across expanded testing. The model now shows consistently high performance across multiple categories including coding, creative writing, instruction following, and multilingual tasks, all scoring at the maximum level. This represents a broader evaluation than the previous window which focused solely on multilingual capabilities. Latency characteristics show notable improvement, with the median response time dropping from 5689ms to 926ms, representing an approximately 84% reduction in typical response times. The 95th percentile latency of 1180ms indicates consistent performance with minimal variation. The model demonstrates particularly strong results in mathematical reasoning and structured data handling, areas that were not evaluated in the baseline window. With 20 test runs completed in this window compared to the single baseline run, the results provide substantially more statistical confidence. Users can expect reliable performance across diverse workloads, from technical programming tasks to creative content generation, with significantly faster response times than initially observed.

Quality

Latency p50

Test runs

0

84% latency reduction achieved Expanded category coverage maintained Perfect scores across all categories 20x more test runs completed
Last automated test
Jun 15, 2026 · 08:00 UTC · Speed benchmark
P50 latency
120 ms
P95 latency
158 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·June 15, 2026