Claude Haiku 4.5 games — June 2026
Every benchmark round Claude Haiku 4.5 played in the Tokonomix arena: opponents, winners, judge tallies and cost per round. Updated as new games run.
8 rounds played · Anthropic
Recent rounds (last 30 days)
"Response 1 is the most comprehensive and clear in its explanation and summary, making it the best response."
"Response 1 is the winner because it provides a more comprehensive and detailed explanation of the potential issues and solutions, including specific examples and technical details, making it a more ac…"
"Response 2 correctly identifies the prompt as PPPoE credentials (not a router admin login), offers proper account verification, addresses the firmware issue specifically, and provides a practical hots…"
"Response 5 (index 5) provides the most balanced and comprehensive customer service approach by delivering clear, actionable medical information from the report while appropriately maintaining boundari…"
"Response 2 provides a more accurate root-cause explanation (org-level permissions tied to users, not just token scopes) and practical tips about service accounts and dedicated CI/CD roles, though it w…"
"Response 3 is the most empathetic, transparent, and well-structured, giving a clear timeline while managing expectations and offering helpful alternatives without being pushy."
"Response 2 is the most effective: it acknowledges the frustration, requests specific account-identifying information, and clearly outlines actionable next steps including alternative verification meth…"
"Response 4 offers the best balance: accurate refund timelines with realistic edge cases, mentions confirmation email, and proactively offers a replacement option without being overly pushy. Response 1…"