Benchmarks
Speed test
P50 = median response time for a standard 500-token output. Measured from EU (Amsterdam). P95 = tail latency — 95% of requests complete within this time. Three runs per model per test cycle; values are medians across cycles.
Tier S< 200 ms
Tier A< 500 ms
Tier B< 1000 ms
Tier C> 1000 ms
P50 (median)P95 (tail)
01Mistral-Small-3.2-24B-Instruct-2506OVH AI Endpoints (GRA)
Tier S77 ms
P95: 122 ms
02Llama-3.1-8B-InstructOVH AI Endpoints (GRA)
Tier S85 ms
P95: 95 ms
03Qwen3-Coder-30B-A3B-InstructOVH AI Endpoints (GRA)
Tier S96 ms
P95: 163 ms
04Qwen2.5-VL-72B-InstructOVH AI Endpoints (GRA)
Tier S111 ms
P95: 119 ms
05Mistral-Nemo-Instruct-2407OVH AI Endpoints (GRA)
Tier S111 ms
P95: 151 ms
06Mistral-7B-Instruct-v0.3OVH AI Endpoints (GRA)
Tier S113 ms
P95: 149 ms
07Meta-Llama-3_3-70B-InstructOVH AI Endpoints (GRA)
Tier S114 ms
P95: 124 ms
08gpt-oss-20bOVH AI Endpoints (GRA)
Tier A216 ms
P95: 294 ms
09gpt-oss-120bOVH AI Endpoints (GRA)
Tier A324 ms
P95: 2211 ms
10gpt-5.4-miniOpenAI
Tier A361 ms
P95: 561 ms
How we measure: Each model receives an identical prompt targeting a ~500-token output. We run 3 sequential calls per test cycle and compute P50/P95 across the distribution. Tests run 4× per day from a single EU endpoint. Network overhead is included.