Benchmarks
Speed test
P50 = median response time for a standard 500-token output. Measured from EU (Amsterdam). P95 = tail latency — 95% of requests complete within this time. Three runs per model per test cycle; values are medians across cycles.
Tier S< 200 ms
Tier A< 500 ms
Tier B< 1000 ms
Tier C> 1000 ms
P50 (median)P95 (tail)
01Mistral-Small-3.2-24B-Instruct-2506OVH AI Endpoints (GRA)
Tier S112 ms
P95: 474 ms
02Mistral-7B-Instruct-v0.3OVH AI Endpoints (GRA)
Tier S114 ms
P95: 128 ms
03Mistral-Nemo-Instruct-2407OVH AI Endpoints (GRA)
Tier S115 ms
P95: 122 ms
04Llama-3.1-8B-InstructOVH AI Endpoints (GRA)
Tier S129 ms
P95: 130 ms
05NVIDIA Nemotron Super 49B v1.5OpenRouter
Tier S178 ms
P95: 196 ms
06Meta-Llama-3_3-70B-InstructOVH AI Endpoints (GRA)
Tier S179 ms
P95: 338 ms
07Llama 4 MaverickOpenRouter
Tier S195 ms
P95: 751 ms
08Nous Hermes 3 70BOpenRouter
Tier S195 ms
P95: 250 ms
09Llama 3.3 70B InstructOpenRouter
Tier S197 ms
P95: 428 ms
10Mistral Voxtral Small 24BOpenRouter
Tier A215 ms
P95: 251 ms
How we measure: Each model receives an identical prompt targeting a ~500-token output. We run 3 sequential calls per test cycle and compute P50/P95 across the distribution. Tests run 4× per day from a single EU endpoint. Network overhead is included.