Benchmarks

Hız testi

P50 = median response time for a standard 500-token output. Measured from EU (Amsterdam). P95 = tail latency — 95% of requests complete within this time. Three runs per model per test cycle; values are medians across cycles.

Tier S< 200 ms

Tier A< 500 ms

Tier B< 1000 ms

Tier C> 1000 ms

P50 (median)P95 (tail)

01FLUX.1 Kontext [max] — Multi-Image Fusionfal.ai

Tier S0 ms

P95: 0 ms

02FLUX.1 Kontext [pro] — Multi-Image Fusionfal.ai

Tier S0 ms

P95: 0 ms

03NVIDIA Nemotron Super 49B v1.5OpenRouter

Tier S51 ms

P95: 57 ms

04Qwen3-Coder-30B-A3B-InstructOVH AI Endpoints (GRA)

Tier S90 ms

P95: 103 ms

05Mistral-Nemo-Instruct-2407OVH AI Endpoints (GRA)

Tier S100 ms

P95: 322 ms

06SDXL 1.0Hugging Face (nscale)

Tier S100 ms

P95: 107 ms

07Qwen2.5-VL-72B-InstructOVH AI Endpoints (GRA)

Tier S130 ms

P95: 1115 ms

08Meta-Llama-3_3-70B-InstructOVH AI Endpoints (GRA)

Tier S140 ms

P95: 1892 ms

09Mistral-7B-Instruct-v0.3OVH AI Endpoints (GRA)

Tier S165 ms

P95: 197 ms

10Nous Hermes 3 70BOpenRouter

Tier S185 ms

P95: 202 ms

How we measure: Each model receives an identical prompt targeting a ~500-token output. We run 3 sequential calls per test cycle and compute P50/P95 across the distribution. Tests run 4× per day from a single EU endpoint. Network overhead is included.