Llama 3.3 70B Instruct — historique de jeu
Chaque round de benchmark joué par Llama 3.3 70B Instruct dans l'arène Tokonomix : adversaires, vainqueurs, résultats du jury et coût par round. Mis à jour à chaque nouvelle partie.
7 rounds joués · OpenRouter
Rounds récents (30 derniers jours)
"Response 3 is the most comprehensive and professional, providing specific details (timestamped notice, specific email addresses, GDPR/DPO references) while maintaining clarity and structure. Response …"
"None of the responses address the user's specific concern about the printer, and several end abruptly mid-sentence. Response 2 is the winner because it provides the most practical, well-structured, an…"
"Response 2 provides a safer, better-prioritized approach by recommending checking from a separate device first to avoid further compromise, uses clear actionable steps, and engages the customer with a…"
"Response 2 (index 1) is the most complete and well-organized, covering SMTP plugins, lightweight testing tools, external deliverability tools, and debugging approaches with clear categorization and pr…"
"Response 2 is more professional, actionable, and complete — it acknowledges the timing correction, offers clear options, verifies the shipping address, and files a parallel courier report. Response 1 …"
"Response 2 correctly identifies the prompt as PPPoE credentials (not a router admin login), offers proper account verification, addresses the firmware issue specifically, and provides a practical hots…"
"Response 2 provides a more accurate root-cause explanation (org-level permissions tied to users, not just token scopes) and practical tips about service accounts and dedicated CI/CD roles, though it w…"