Surface the error one model misses.
One prompt fans out to top models in parallel. A neutral judge from a different lab flags where they disagree — and reconciles them into a single, defensible answer. EU-hosted, fully traceable.
Reduce the errors one model would miss.
- 131
- models tracked
- 13,593
- benchmark runs
- 6
- languages
Did the EU AI Act enter into force in 2024?
- claude-opus-4.8Yes — entered into force August 2024.
- gpt-5.1No, that was 2023.
- gemini-3-proYes, August 2024.
Illustrative example — synthetic data
5 AI models inspect your image — before your audience does.
Image consensus: a council of five vision models catches anatomy, physics and lighting flaws in AI images that a single model misses.
More about image consensus →
Pilot 2026-06 · LOKI-35 + real control photos · not a product guarantee.
3 of 5 saw it. One model alone would have missed it — hence a council.
Live rankings
Top models this week
Sample data
Top models — Scientific Reasoning
01Mistral Large 3Mistral
780ms↓
02Claude Sonnet 4.6Anthropic
920ms·
03Llama 3.3 405BMeta
1.18s↑
04Gemini 2.5 ProGoogle
1.42s↑
05GPT-5oOpenAI
1.64s·
06Claude Opus 4.7Anthropic
1.82s↑
Sample · methodology pending
how we test →Judge verdicts
3,735 evaluations across 63 models — counts only, no customer prompts
Claude Fable 5 — intelligence test
Independent, judge-scored results across our task categories — from real test runs, refreshed continuously.
Score by task category
Median response time
Each answer is scored 0–100 by an independent judge model on accuracy, completeness, reasoning and format. Lower factual scores reflect our deliberately hard knowledge probes.
Release notes →See where the models split.
Across our weekly intelligence tests, a neutral judge scores every model. These are the questions where the models disagreed most — the blind spots a single model would have hidden. Anonymised; no customer prompts are ever shown.
Models ranked
Top 10 AI models
Anthropic
Claude Opus 4.5
99.2
quality score
6,978
ms p50
Anthropic
Claude Opus 4.7
99.2
quality score
8,347
ms p50
Anthropic
Claude Opus 4.6
98.7
quality score
8,280
ms p50
Anthropic
Claude Opus 4.8
98.6
quality score
6,696
ms p50
OpenAI
gpt-4.1
98.4
quality score
1,711
ms p50
Anthropic
Claude Sonnet 4.6
97.9
quality score
7,490
ms p50
Anthropic
Claude Sonnet 4.5
95.9
quality score
6,728
ms p50
Anthropic
Claude Haiku 4.5
95.7
quality score
3,326
ms p50
Google Gemini
Gemini 2.5 Flash-Lite
94.7
quality score
1,572
ms p50
Google Gemini
Gemini Flash Latest
53.3
quality score
4,366
ms p50
No fee on single calls. You only pay the fee on consensus.
Ask one model and you pay just its tokens plus a small tier margin — no platform fee. The per-call fee applies only to multi-model consensus checks. 100 consensus checks free every month, no card needed; bundles from €10/month for 500 calls. Every token itemised, nothing hidden.
Free
€0/mo
100 calls/mo
token use: provider +5%
Starter
€10/mo
500 calls
token use: provider +4%
Studio
€25/mo
2,000 calls
token use: provider +3%
Scale
€50/mo
5,000 calls
token use: provider +2%
Founders prices, locked through 2027 · PAYG also available · "token margin" = the small % we add on the model provider's own token price, lower on higher tiers
No per-seat fee. No single-call fee, ever. Every consensus receipt is itemised per model, per token, in and out.
Every cent, itemised
illustrative examplemodel in out cost ────────────────────────────────────────────────── claude-haiku-4.5 812 540 €0.0041 gpt-4o 812 610 €0.0072 gemini-2.5-flash 812 498 €0.0029 judge (gpt-4o) — 240 €0.0038 ────────────────────────────────────────────────── orchestration included total €0.0180
Accurate to the last token · your real receipt contains your exact counts
Estimate your cost
€10.00
Bundle price — overage at 1.5c/call above quota
€10.00
estimated / month
Community
What the community is voting on
Top-rated test answers
Schrijf een Python-functie `is_palindroom(s: str) -> bool` die True retourneert als de invoerstring een palindroom is (hoofdletters negeren, leestekens negeren). Voeg twee testcases toe.
What is the name of the protein discovered by Dr. Elena Voskresensky in 2019 that reverses telomere shortening in human cells?
In which year did the European Union introduce the GDPR regulation?
Suggested test questions
No suggestions yet.
Run a test and suggest a question →How we test
Real prompts, real latency, real scores. Three-tier framework so cost stays under control without compromising transparency.
Full coverage
Speed + intelligence test daily across all four languages.
Speed only
Latency and uptime sampled four times per day.
Health ping
Up/down check every fifteen minutes.
Try any model — right here
Pick a model, type a prompt, see the answer stream. No sign-up, no wallet, no context-switching.
Open the live tester →