Benchmarks
Intelligence test
Model quality scored 0–100 by an impartial judge LLM (Claude Sonnet 4.6, blind). Six categories: reasoning, coding, creativity, factual accuracy, instruction-following, and safety.
Claude Opus 4.7
Tier AClaude Sonnet 4.6
Tier AClaude Opus 4.5
Tier BClaude Haiku 4.5
Tier AClaude Sonnet 4.5
Tier BClaude Opus 4.1
Tier CClaude Opus 4
Tier CClaude Sonnet 4
Tier Cgpt-3.5-turbo
Tier Cgpt-3.5-turbo-16k
gpt-4-0613
gpt-4
Tier Cgpt-3.5-turbo-0125
gpt-4-turbo
Tier Cgpt-4-turbo-2024-04-09
Tier Cgpt-4o
Tier Cgpt-4o-2024-05-13
Tier Cgpt-4o-mini-2024-07-18
Tier Cgpt-4o-mini
Tier Cgpt-4o-2024-08-06
Tier Cgpt-4o-2024-11-20
Tier Cgpt-4o-mini-search-preview-2025-03-11
gpt-4o-mini-search-preview
Tier Cgpt-4.1-2025-04-14
gpt-4.1
Tier Bgpt-4.1-mini-2025-04-14
gpt-4.1-mini
Tier Cgpt-4.1-nano-2025-04-14
gpt-4.1-nano
Tier Cgpt-5-chat-latest
Tier Cgpt-5-search-api
Tier Cgpt-5-search-api-2025-10-14
gpt-4o-search-preview-2025-03-11
Gemma 3 1B
Tier CGemma 3 4B
Tier CGemma 3 12B
Tier BGemma 3 27B
Tier AGemma 3n E4B
Tier CGemma 3n E2B
Tier CGemini Flash Latest
Tier BGemini Flash-Lite Latest
Tier CGemini 2.5 Flash-Lite
Tier BNano Banana
Gemini 3.1 Flash Lite Preview
Tier CNano Banana 2
Gemini Robotics-ER 1.6 Preview
Claude Opus 4.6
Tier Bgpt-4o-search-preview
Tier CGemma 4 26B A4B IT
Tier CGemma 4 31B IT
Tier Cgpt-3.5-turbo-1106
Lyria 3 Clip Preview
Gemini 3 Flash Preview
Tier CGemini 3.1 Pro Preview Custom Tools
Tier CGemini Pro Latest
Tier CGemini 3 Pro Preview
Tier AGemini 3.1 Pro Preview
Tier CGemini 2.5 Flash
Tier AGemini 2.5 Pro
Tier ANano Banana Pro
Nano Banana Pro
Lyria 3 Pro Preview
62 models scored · category breakdown estimated (full per-category scoring in Q3 2026)