Benchmarks

Speed test

P50 = median response time for a standard 500-token output. Measured from EU (Amsterdam). P95 = tail latency — 95% of requests complete within this time. Three runs per model per test cycle; values are medians across cycles.

Tier S< 200 ms

Tier A< 500 ms

Tier B< 1000 ms

Tier C> 1000 ms

P50 (median)P95 (tail)

01Mistral-Small-3.2-24B-Instruct-2506OVH AI Endpoints (GRA)

Tier S77 ms

P95: 122 ms

02Llama-3.1-8B-InstructOVH AI Endpoints (GRA)

Tier S85 ms

P95: 95 ms

03Qwen3-Coder-30B-A3B-InstructOVH AI Endpoints (GRA)

Tier S96 ms

P95: 163 ms

04Qwen2.5-VL-72B-InstructOVH AI Endpoints (GRA)

Tier S111 ms

P95: 119 ms

05Mistral-Nemo-Instruct-2407OVH AI Endpoints (GRA)

Tier S111 ms

P95: 151 ms

06Mistral-7B-Instruct-v0.3OVH AI Endpoints (GRA)

Tier S113 ms

P95: 149 ms

07Meta-Llama-3_3-70B-InstructOVH AI Endpoints (GRA)

Tier S114 ms

P95: 124 ms

08gpt-oss-20bOVH AI Endpoints (GRA)

Tier A216 ms

P95: 294 ms

09gpt-oss-120bOVH AI Endpoints (GRA)

Tier A324 ms

P95: 2211 ms

10gpt-5.4-miniOpenAI

Tier A361 ms

P95: 561 ms

How we measure: Each model receives an identical prompt targeting a ~500-token output. We run 3 sequential calls per test cycle and compute P50/P95 across the distribution. Tests run 4× per day from a single EU endpoint. Network overhead is included.