Claude Opus 4.5 — game history
Every benchmark round Claude Opus 4.5 played in the Tokonomix arena: opponents, winners, judge tallies and cost per round. Updated as new games run.
3 rounds played · Anthropic
Recent rounds (last 30 days)
"Response 6 (index 5) is best because it provides the correct, clear technical answer while also being exceptionally empathetic, gently addressing the user's repetitive questioning with compassion and …"
"Response 1 is excellent, providing clear refund timelines, showing empathy for the delay, and politely asking for the necessary order details. Response 3 cuts off mid-sentence, and Response 2 suggests…"
"Response 2 is the best because it provides both helpful customer service guidance AND a clean, accurate JSON extraction of the invoice data, making it more comprehensive and useful. Response 1 is good…"