Skip to content

Benchmarks

Intelligence test

Model quality scored 0–100 by an impartial judge LLM (Claude Sonnet 4.6, blind). Six categories: reasoning, coding, creativity, factual accuracy, instruction-following, and safety.

#1Anthropic
100

Claude Opus 4.7

Tier A
Reasoning95
Coding90
Factual100
#2Anthropic
100

Claude Sonnet 4.6

Tier A
Reasoning95
Coding90
Factual100
#3Anthropic
100

Claude Opus 4.5

Tier B
Reasoning95
Coding90
Factual100
#4Anthropic
100

Claude Haiku 4.5

Tier A
Reasoning95
Coding90
Factual100
#5Anthropic
100

Claude Sonnet 4.5

Tier B
Reasoning95
Coding90
Factual100
#6Anthropic
100

Claude Opus 4.1

Tier C
Reasoning95
Coding90
Factual100
#7Anthropic
100

Claude Opus 4

Tier C
Reasoning95
Coding90
Factual100
#8Anthropic
100

Claude Sonnet 4

Tier C
Reasoning95
Coding90
Factual100
#9OpenAI
100

gpt-3.5-turbo

Tier C
Reasoning95
Coding90
Factual100
#10OpenAI
100

gpt-3.5-turbo-16k

Reasoning95
Coding90
Factual100
#11OpenAI
100

gpt-4-0613

Reasoning95
Coding90
Factual100
#12OpenAI
100

gpt-4

Tier C
Reasoning95
Coding90
Factual100
#13OpenAI
100

gpt-3.5-turbo-0125

Reasoning95
Coding90
Factual100
#14OpenAI
100

gpt-4-turbo

Tier C
Reasoning95
Coding90
Factual100
#15OpenAI
100

gpt-4-turbo-2024-04-09

Tier C
Reasoning95
Coding90
Factual100
#16OpenAI
100

gpt-4o

Tier C
Reasoning95
Coding90
Factual100
#17OpenAI
100

gpt-4o-2024-05-13

Tier C
Reasoning95
Coding90
Factual100
#18OpenAI
100

gpt-4o-mini-2024-07-18

Tier C
Reasoning95
Coding90
Factual100
#19OpenAI
100

gpt-4o-mini

Tier C
Reasoning95
Coding90
Factual100
#20OpenAI
100

gpt-4o-2024-08-06

Tier C
Reasoning95
Coding90
Factual100
#21OpenAI
100

gpt-4o-2024-11-20

Tier C
Reasoning95
Coding90
Factual100
#22OpenAI
100

gpt-4o-mini-search-preview-2025-03-11

Reasoning95
Coding90
Factual100
#23OpenAI
100

gpt-4o-mini-search-preview

Tier C
Reasoning95
Coding90
Factual100
#24OpenAI
100

gpt-4.1-2025-04-14

Reasoning95
Coding90
Factual100
#25OpenAI
100

gpt-4.1

Tier B
Reasoning95
Coding90
Factual100
#26OpenAI
100

gpt-4.1-mini-2025-04-14

Reasoning95
Coding90
Factual100
#27OpenAI
100

gpt-4.1-mini

Tier C
Reasoning95
Coding90
Factual100
#28OpenAI
100

gpt-4.1-nano-2025-04-14

Reasoning95
Coding90
Factual100
#29OpenAI
100

gpt-4.1-nano

Tier C
Reasoning95
Coding90
Factual100
#30OpenAI
100

gpt-5-chat-latest

Tier C
Reasoning95
Coding90
Factual100
#31OpenAI
100

gpt-5-search-api

Tier C
Reasoning95
Coding90
Factual100
#32OpenAI
100

gpt-5-search-api-2025-10-14

Reasoning95
Coding90
Factual100
#33OpenAI
100

gpt-4o-search-preview-2025-03-11

Reasoning95
Coding90
Factual100
#34Google Gemini
100

Gemma 3 1B

Tier C
Reasoning95
Coding90
Factual100
#35Google Gemini
100

Gemma 3 4B

Tier C
Reasoning95
Coding90
Factual100
#36Google Gemini
100

Gemma 3 12B

Tier B
Reasoning95
Coding90
Factual100
#37Google Gemini
100

Gemma 3 27B

Tier A
Reasoning95
Coding90
Factual100
#38Google Gemini
100

Gemma 3n E4B

Tier C
Reasoning95
Coding90
Factual100
#39Google Gemini
100

Gemma 3n E2B

Tier C
Reasoning95
Coding90
Factual100
#40Google Gemini
100

Gemini Flash Latest

Tier B
Reasoning95
Coding90
Factual100
#41Google Gemini
100

Gemini Flash-Lite Latest

Tier C
Reasoning95
Coding90
Factual100
#42Google Gemini
100

Gemini 2.5 Flash-Lite

Tier B
Reasoning95
Coding90
Factual100
#43Google Gemini
100

Nano Banana

Reasoning95
Coding90
Factual100
#44Google Gemini
100

Gemini 3.1 Flash Lite Preview

Tier C
Reasoning95
Coding90
Factual100
#45Google Gemini
100

Nano Banana 2

Reasoning95
Coding90
Factual100
#46Google Gemini
100

Gemini Robotics-ER 1.6 Preview

Reasoning95
Coding90
Factual100
#47Anthropic
98

Claude Opus 4.6

Tier B
Reasoning93
Coding88
Factual98
#48OpenAI
98

gpt-4o-search-preview

Tier C
Reasoning93
Coding88
Factual98
#49Google Gemini
98

Gemma 4 26B A4B IT

Tier C
Reasoning93
Coding88
Factual98
#50Google Gemini
98

Gemma 4 31B IT

Tier C
Reasoning93
Coding88
Factual98
#51OpenAI
81

gpt-3.5-turbo-1106

Reasoning77
Coding73
Factual81
#52Google Gemini
80

Lyria 3 Clip Preview

Reasoning76
Coding72
Factual80
#53Google Gemini
45

Gemini 3 Flash Preview

Tier C
Reasoning43
Coding41
Factual45
#54Google Gemini
45

Gemini 3.1 Pro Preview Custom Tools

Tier C
Reasoning43
Coding41
Factual45
#55Google Gemini
35

Gemini Pro Latest

Tier C
Reasoning33
Coding32
Factual35
#56Google Gemini
25

Gemini 3 Pro Preview

Tier A
Reasoning24
Coding23
Factual25
#57Google Gemini
25

Gemini 3.1 Pro Preview

Tier C
Reasoning24
Coding23
Factual25
#58Google Gemini
0

Gemini 2.5 Flash

Tier A
Reasoning0
Coding0
Factual0
#59Google Gemini
0

Gemini 2.5 Pro

Tier A
Reasoning0
Coding0
Factual0
#60Google Gemini
0

Nano Banana Pro

Reasoning0
Coding0
Factual0
#61Google Gemini
0

Nano Banana Pro

Reasoning0
Coding0
Factual0
#62Google Gemini
0

Lyria 3 Pro Preview

Reasoning0
Coding0
Factual0

62 models scored · category breakdown estimated (full per-category scoring in Q3 2026)