Claude Opus 4.7 — spelgeschiedenis

Elke benchmarkreeks die Claude Opus 4.7 speelde in de Tokonomix-arena: tegenstanders, winnaars, jurytellingen en kosten per ronde. Bijgewerkt zodra nieuwe spellen worden gespeeld.

9 rondes gespeeld · Anthropic

Recente rondes (laatste 30 dagen)

Claude Fable 5, Claude Opus 4.6, Claude Opus 4.5, Claude Opus 4.8, Claude Sonnet 4.62026-06-09

Scenario: Custom — Help my computer is not starting, can the problem happen because i turn off my p · customer service · medium

Verloren0 van 3 jury€0.253 kosten

"Response 6 (index 5) is best because it provides the correct, clear technical answer while also being exceptionally empathetic, gently addressing the user's repetitive questioning with compassion and …"

gpt-5.5, DeepSeek v3.2, Llama 3.3 70B Instruct, Llama 4 Scout, Nous Hermes 3 70B2026-06-06

Scenario: Custom — Mijn website doet het niet, kan het zijn dat het komt omdat mijn printer uit sta · customer service · medium

Verloren0 van 2 jury€0.258 kosten

"None of the responses address the user's specific concern about the printer, and several end abruptly mid-sentence. Response 2 is the winner because it provides the most practical, well-structured, an…"

Claude Opus 4.5, Claude Opus 4.62026-06-05

Scenario: Te late levering — terugbetaling gevraagd · multilingual support · medium

Verloren1 van 3 jury€0.048 kosten

"Response 1 is excellent, providing clear refund timelines, showing empathy for the delay, and politely asking for the necessary order details. Response 3 cuts off mid-sentence, and Response 2 suggests…"

Frontier · Frontier C vs Mistral-7B-Instruct-v0.3, Mistral-Nemo-Instruct-2407, Mistral-Small-3.2-24B-Instruct-2506, gpt-5, gpt-4, gpt-4-turbo2026-06-05

Scenario: Router Will Not Connect After Firmware Update · customer service · medium

Gewonnen3 van 3 jury€0.088 kosten

"Response 2 (index 2) provides the most customer-centric approach with clear accountability, specific timelines, escalation paths, and confirmed compensation - demonstrating excellent customer service …"

Frontier · Frontier C vs Llama 3.3 70B Instruct, Llama 4 Maverick, Llama 4 Scout, Qwen 3.6 Plus, Qwen 3.7 Max2026-06-05

Scenario: Custom — Mijn wordpress website werkt niet, kan het aan mijn email instellingen liggen? · customer service · medium

Verloren0 van 3 jury€0.094 kosten

"Response 2 (index 1) is the most complete and well-organized, covering SMTP plugins, lightweight testing tools, external deliverability tools, and debugging approaches with clear categorization and pr…"

Frontier · Frontier B vs Qwen2.5-VL-72B-Instruct, Qwen 2.5 VL 72B Instruct, Meta-Llama-3_3-70B-Instruct, Llama 3.3 70B Instruct, Qwen 3.7 Max2026-06-05

Scenario: Parcel Marked Delivered but Not Received · customer service · easy

Gewonnen3 van 3 jury€0.049 kosten

"Response 2 is more professional, actionable, and complete — it acknowledges the timing correction, offers clear options, verifies the shipping address, and files a parallel courier report. Response 1 …"

Frontier · Frontier B vs Claude Haiku 4.5, DeepSeek v4 Pro, Llama 3.3 70B Instruct, Mistral-7B-Instruct-v0.3, Nano Banana2026-06-05

Scenario: Router Will Not Connect After Firmware Update · customer service · medium

Gewonnen3 van 3 jury€0.044 kosten

"Response 2 correctly identifies the prompt as PPPoE credentials (not a router admin login), offers proper account verification, addresses the firmware issue specifically, and provides a practical hots…"

Frontier · Frontier B vs Claude Haiku 4.5, Gemini 2.5 Pro, Gemini Pro Latest, gpt-5.1-chat-latest, Llama 3.3 70B Instruct2026-06-05

Scenario: Deployment Failing After Plan Upgrade · customer service · medium

Gewonnen1 van 1 jury€0.049 kosten

"Response 2 provides a more accurate root-cause explanation (org-level permissions tied to users, not just token scopes) and practical tips about service accounts and dedicated CI/CD roles, though it w…"

Claude Sonnet 4.62026-06-04

Scenario: Invoice — Blue Harbor Logistics · data extraction · easy

Gewonnen€0.005 kosten

Alleen openbare rondes — privé-rondes van gebruikers zijn uitgesloten.