Ir al contenido

Llama 3.3 70B Instruct games — junio 2026

Cada ronda de benchmark que Llama 3.3 70B Instruct jugó en la arena Tokonomix: oponentes, ganadores, conteos del jurado y costo por ronda. Actualizado cuando se juegan nuevas partidas.

7 rondas jugadas · OpenRouter

Rondas recientes (últimos 30 días)

gpt-5.5, Qwen 3.6 Plus, DeepSeek v4 Pro2026-06-06
Escenario: Account Merged Without Consent · multilingual support · hard
Perdió0 de 1 jueces€0.001 costo

"Response 3 is the most comprehensive and professional, providing specific details (timestamped notice, specific email addresses, GDPR/DPO references) while maintaining clarity and structure. Response "

Claude Opus 4.7, gpt-5.5, DeepSeek v3.2, Llama 4 Scout, Nous Hermes 3 70B2026-06-06
Escenario: Custom — Mijn website doet het niet, kan het zijn dat het komt omdat mijn printer uit sta · customer service · medium
Perdió0 de 2 jueces€0.003 costo

"None of the responses address the user's specific concern about the printer, and several end abruptly mid-sentence. Response 2 is the winner because it provides the most practical, well-structured, an"

Consejo · Council A vs Claude Opus 42026-06-05
Escenario: Custom — Mijn pc start niet op, kan het zijn dat ze mijn website hebben gehacked? · customer service · medium
Perdió0 de 1 jueces€0.007 costo

"Response 2 provides a safer, better-prioritized approach by recommending checking from a separate device first to avoid further compromise, uses clear actionable steps, and engages the customer with a"

Consejo · Council A vs Qwen 3.6 Plus, Qwen 3.7 Max, Claude Opus 4.72026-06-05
Escenario: Custom — Mijn wordpress website werkt niet, kan het aan mijn email instellingen liggen? · customer service · medium
Perdió0 de 3 jueces€0.007 costo

"Response 2 (index 1) is the most complete and well-organized, covering SMTP plugins, lightweight testing tools, external deliverability tools, and debugging approaches with clear categorization and pr"

Consejo · Council A vs Claude Opus 4.72026-06-05
Escenario: Parcel Marked Delivered but Not Received · customer service · easy
Perdió0 de 3 jueces€0.086 costo

"Response 2 is more professional, actionable, and complete — it acknowledges the timing correction, offers clear options, verifies the shipping address, and files a parallel courier report. Response 1 "

Consejo · Council A vs Claude Opus 4.72026-06-05
Escenario: Router Will Not Connect After Firmware Update · customer service · medium
Perdió0 de 3 jueces€0.028 costo

"Response 2 correctly identifies the prompt as PPPoE credentials (not a router admin login), offers proper account verification, addresses the firmware issue specifically, and provides a practical hots"

Consejo · Council A vs Claude Opus 4.72026-06-05
Escenario: Deployment Failing After Plan Upgrade · customer service · medium
Perdió0 de 1 jueces€0.049 costo

"Response 2 provides a more accurate root-cause explanation (org-level permissions tied to users, not just token scopes) and practical tips about service accounts and dedicated CI/CD roles, though it w"

Solo rondas públicas — las rondas privadas de usuarios están excluidas.