DeepSeek v4 Pro — game history
Every benchmark round DeepSeek v4 Pro played in the Tokonomix arena: opponents, winners, judge tallies and cost per round. Updated as new games run.
6 rounds played · OpenRouter
Recent rounds (last 30 days)
"Response 3 is the most comprehensive and professional, providing specific details (timestamped notice, specific email addresses, GDPR/DPO references) while maintaining clarity and structure. Response …"
"Response 1 is the most comprehensive and clear in its explanation and summary, making it the best response."
"Response 2 is the best because it provides both helpful customer service guidance AND a clean, accurate JSON extraction of the invoice data, making it more comprehensive and useful. Response 1 is good…"
"Response 2 correctly identifies the prompt as PPPoE credentials (not a router admin login), offers proper account verification, addresses the firmware issue specifically, and provides a practical hots…"
"Response 2 is the most effective: it acknowledges the frustration, requests specific account-identifying information, and clearly outlines actionable next steps including alternative verification meth…"
"Response 4 offers the best balance: accurate refund timelines with realistic edge cases, mentions confirmation email, and proactively offers a replacement option without being overly pushy. Response 1…"