Zum Inhalt

Llama 3.3 70B Instruct — Spielhistorie

Jede Benchmark-Runde, die Llama 3.3 70B Instruct in der Tokonomix-Arena gespielt hat: Gegner, Sieger, Jury-Zählungen und Kosten pro Runde. Wird bei neuen Spielen aktualisiert.

7 Runden gespielt · OpenRouter

Aktuelle Runden (letzte 30 tage)

gpt-5.5, Qwen 3.6 Plus, DeepSeek v4 Pro2026-06-06
Szenario: Account Merged Without Consent · multilingual support · hard
Verloren0 von 1 Juroren€0.001 Kosten

"Response 3 is the most comprehensive and professional, providing specific details (timestamped notice, specific email addresses, GDPR/DPO references) while maintaining clarity and structure. Response "

Claude Opus 4.7, gpt-5.5, DeepSeek v3.2, Llama 4 Scout, Nous Hermes 3 70B2026-06-06
Szenario: Custom — Mijn website doet het niet, kan het zijn dat het komt omdat mijn printer uit sta · customer service · medium
Verloren0 von 2 Juroren€0.003 Kosten

"None of the responses address the user's specific concern about the printer, and several end abruptly mid-sentence. Response 2 is the winner because it provides the most practical, well-structured, an"

Rat · Council A vs Claude Opus 42026-06-05
Szenario: Custom — Mijn pc start niet op, kan het zijn dat ze mijn website hebben gehacked? · customer service · medium
Verloren0 von 1 Juroren€0.007 Kosten

"Response 2 provides a safer, better-prioritized approach by recommending checking from a separate device first to avoid further compromise, uses clear actionable steps, and engages the customer with a"

Rat · Council A vs Qwen 3.6 Plus, Qwen 3.7 Max, Claude Opus 4.72026-06-05
Szenario: Custom — Mijn wordpress website werkt niet, kan het aan mijn email instellingen liggen? · customer service · medium
Verloren0 von 3 Juroren€0.007 Kosten

"Response 2 (index 1) is the most complete and well-organized, covering SMTP plugins, lightweight testing tools, external deliverability tools, and debugging approaches with clear categorization and pr"

Rat · Council A vs Claude Opus 4.72026-06-05
Szenario: Parcel Marked Delivered but Not Received · customer service · easy
Verloren0 von 3 Juroren€0.086 Kosten

"Response 2 is more professional, actionable, and complete — it acknowledges the timing correction, offers clear options, verifies the shipping address, and files a parallel courier report. Response 1 "

Rat · Council A vs Claude Opus 4.72026-06-05
Szenario: Router Will Not Connect After Firmware Update · customer service · medium
Verloren0 von 3 Juroren€0.028 Kosten

"Response 2 correctly identifies the prompt as PPPoE credentials (not a router admin login), offers proper account verification, addresses the firmware issue specifically, and provides a practical hots"

Rat · Council A vs Claude Opus 4.72026-06-05
Szenario: Deployment Failing After Plan Upgrade · customer service · medium
Verloren0 von 1 Juroren€0.049 Kosten

"Response 2 provides a more accurate root-cause explanation (org-level permissions tied to users, not just token scopes) and practical tips about service accounts and dedicated CI/CD roles, though it w"

Nur öffentliche Runden — private Nutzerrunden werden ausgeschlossen.