Skip to content

Benchmarks

Intelligence test

Model quality scored 0–100 by an impartial judge LLM (Claude Sonnet 4.6, blind). Six categories: reasoning, coding, creativity, factual accuracy, instruction-following, and safety.

#1Anthropic
100
Reasoning95
Coding90
Factual100
#2Anthropic
100
Reasoning95
Coding90
Factual100
#3Anthropic
100
Reasoning95
Coding90
Factual100
#4Anthropic
100
Reasoning95
Coding90
Factual100
#5Anthropic
100
Reasoning95
Coding90
Factual100
#6Anthropic
100
Reasoning95
Coding90
Factual100
#7OpenAI
100
Reasoning95
Coding90
Factual100
#8OpenAI
100
Reasoning95
Coding90
Factual100
#9OpenAI
100
Reasoning95
Coding90
Factual100
#10OpenAI
100
Reasoning95
Coding90
Factual100
#11OpenAI
100
Reasoning95
Coding90
Factual100
#12OpenAI
100
Reasoning95
Coding90
Factual100
#13OpenAI
100
Reasoning95
Coding90
Factual100
#14OpenAI
100
gpt-4.1
Tier B
Reasoning95
Coding90
Factual100
#15OpenAI
100
Reasoning95
Coding90
Factual100
#16OpenAI
100
Reasoning95
Coding90
Factual100
#17OpenAI
100
Reasoning95
Coding90
Factual100
#18OpenAI
100
Reasoning95
Coding90
Factual100
#19OpenAI
100
Reasoning95
Coding90
Factual100
#20OpenAI
100
Reasoning95
Coding90
Factual100
#21Google Gemini
100
Reasoning95
Coding90
Factual100
#22OVH AI Endpoints (GRA)
100
Reasoning95
Coding90
Factual100
#23OVH AI Endpoints (GRA)
100
Reasoning95
Coding90
Factual100
#24Google Gemini
100
Reasoning95
Coding90
Factual100
#25Anthropic
100
Reasoning95
Coding90
Factual100
#26OpenAI
99
Reasoning94
Coding89
Factual99
#27OpenAI
99
gpt-4
Tier C
Reasoning94
Coding89
Factual99
#28OpenAI
99
Reasoning94
Coding89
Factual99
#29OpenAI
98
gpt-4o
Tier C
Reasoning93
Coding88
Factual98
#30OpenAI
98
Reasoning93
Coding88
Factual98
#31OVH AI Endpoints (GRA)
98
Reasoning93
Coding88
Factual98
#32OpenAI
97
Reasoning92
Coding87
Factual97
#33OpenAI
97
Reasoning92
Coding87
Factual97
#34Google Gemini
97
Reasoning92
Coding87
Factual97
#35OVH AI Endpoints (GRA)
97
Reasoning92
Coding87
Factual97
#36Anthropic
96
Reasoning91
Coding86
Factual96
#37OpenAI
96
Reasoning91
Coding86
Factual96
#38Google Gemini
95
Reasoning90
Coding86
Factual95
#39OpenAI
91
Reasoning86
Coding82
Factual91
#40Google Gemini
91
Reasoning86
Coding82
Factual91
#41OVH AI Endpoints (GRA)
91
Reasoning86
Coding82
Factual91
#42OpenAI
75
Reasoning71
Coding68
Factual75
#43OpenAI
73
Reasoning69
Coding66
Factual73
#44OpenAI
63
Reasoning60
Coding57
Factual63
#45OpenAI
53
Reasoning50
Coding48
Factual53
#46OVH AI Endpoints (GRA)
53
Reasoning50
Coding48
Factual53
#47OVH AI Endpoints (GRA)
51
Reasoning48
Coding46
Factual51
#48OVH AI Endpoints (GRA)
48
Reasoning46
Coding43
Factual48
#49Google Gemini
18
Reasoning17
Coding16
Factual18
#50Google Gemini
5
Reasoning5
Coding5
Factual5
#51Google Gemini
5
Reasoning5
Coding5
Factual5
#52Google Gemini
5
Reasoning5
Coding5
Factual5
#53Google Gemini
5
Reasoning5
Coding5
Factual5
#54Google Gemini
0
Reasoning0
Coding0
Factual0
#55Google Gemini
0
Reasoning0
Coding0
Factual0
#56Google Gemini
0
Reasoning0
Coding0
Factual0
#57Google Gemini
0
Reasoning0
Coding0
Factual0
#58Google Gemini
0
Reasoning0
Coding0
Factual0
#59Google Gemini
0
Reasoning0
Coding0
Factual0
#60OVH AI Endpoints (GRA)
0
Reasoning0
Coding0
Factual0
#61OVH AI Endpoints (GRA)
0
Reasoning0
Coding0
Factual0
#62OVH AI Endpoints (GRA)
0
Reasoning0
Coding0
Factual0
#63OVH AI Endpoints (GRA)
0
Reasoning0
Coding0
Factual0

63 models scored · category breakdown estimated (full per-category scoring in Q3 2026)