Model quality scored 0–100 by an impartial judge LLM (Claude Sonnet 4.6, blind). Six categories: reasoning, coding, creativity, factual accuracy, instruction-following, and safety.
We use strictly necessary cookies to operate Tokonomix. With your consent we also use analytics to improve the product. Read our Privacy Policy. Privacy Policy