Daily Arena

Match replay

Replaying a stored match — no models are called.

⚖ Multi-judge consensus — our trademark

Tokonomix multi-council + judge + blind-spot detection — lower cost, and it catches the mistakes one model misses.

Multi-council · lower costMulti-judge · cross-familyBlind-spot detection · catch the missed mistakeN-team · groups vs each other

Game type

Turns: 2

Speed1×

data_extraction · roundTurn 0 / 2

The cheapest model that keeps up on quality appears here.

0 / 2

Claude Haiku 4.5

Anthropic

€—score —

100

Gemini 2.5 Flash

Google Gemini

€—score —

100

Gemini Pro Latest

Google Gemini

€—score —

100

gpt-4.1

OpenAI

€—score —

100

gpt-4o-2024-05-13

OpenAI

€—score —

100

gpt-5.5-2026-04-23

OpenAI

€—score —

100

Customer

Press “Next turn” to begin.

Final verdict — cost, quality & voorsprong

Players	Cost	Quality	Voorsprong / status
Claude Haiku 4.5	€0.0029	71.8	100 HP
Gemini 2.5 Flash	€0.0015	65.2	100 HP
Gemini Pro Latest	€0.0099	6	drained
gpt-4.1	€0.0029	64.8	100 HP
gpt-4o-2024-05-13	€0.0080	66.4	100 HP
gpt-5.5-2026-04-23	€0.0141	71.4	drained

0 / 2Drone damage = jury-majority strength · HP = live voorsprong · € = real cost

Honesty boundary

Advantage starts at 100. Knock-outs follow the judges' final standing — the lowest-ranked model falls first, paced so the last one lands near the round's end. The judges' winner is never targeted, so it is always the last standing (deriveRoundOutcomes v9-elim-tokonomix).

When the panel ends in a genuine tie for first, no model is eliminated and every model plays to the very end.

Reaching 0 advantage means that model is eliminated; once only the winner remains the replay flashes the result. The end-of-round judge panel below crowns that same last-standing model.

The advantage bar visualizes the final standing, not per-turn quality — the per-turn winner badge separately marks who answered best each turn.

Score-scale is the highest turn-score seen in this replay (0–10 or 0–100); one high turn can make the others look closer.

Zero model dispatch — pure render of the stored round. Switching the view changes the picture, never the numbers.

Back to the arena

Share this result

Share on X LinkedIn WhatsApp

↺ Start a new round