Claude Opus 4.8 — game history
Every benchmark round Claude Opus 4.8 played in the Tokonomix arena: opponents, winners, judge tallies and cost per round. Updated as new games run.
5 rounds played · Anthropic
Recent rounds (last month)
gpt-oss-20b, Llama-3.1-8B-Instruct, Gemini 2.5 Pro, Cohere Command-A2026-06-18
Scenario: Software License Agreement — Acme & Northwind · data extraction · medium
gpt-oss-20b, Llama-3.1-8B-Instruct2026-06-18
Scenario: Software License Agreement — Acme & Northwind · data extraction · medium
gpt-oss-20b, Llama-3.1-8B-Instruct2026-06-18
Scenario: Office Lease Agreement — Riverside Tower · data extraction · hard
Llama 4 Scout, gpt-4.1-nano2026-06-09
Scenario: Huurovereenkomst bedrijfsruimte — Zuidas · data extraction · medium
Claude Fable 5, Claude Opus 4.6, Claude Opus 4.7, Claude Opus 4.5, Claude Sonnet 4.62026-06-09
Scenario: Custom — Help my computer is not starting, can the problem happen because i turn off my p · customer service · medium
"Response 6 (index 5) is best because it provides the correct, clear technical answer while also being exceptionally empathetic, gently addressing the user's repetitive questioning with compassion and …"
Public rounds only — private user-play rounds are excluded.