İçeriğe geç

Benchmarks

Zeka testi

Model quality scored 0–100 by an impartial judge LLM (Claude Sonnet 4.6, blind). Six categories: reasoning, coding, creativity, factual accuracy, instruction-following, and safety.

#1Anthropic
100
Reasoning95
Coding90
Factual100
#2Anthropic
100
Reasoning95
Coding90
Factual100
#3Anthropic
100
Reasoning95
Coding90
Factual100
#4Anthropic
100
Reasoning95
Coding90
Factual100
#5Anthropic
100
Reasoning95
Coding90
Factual100
#6Anthropic
100
Reasoning95
Coding90
Factual100
#7Anthropic
100
Reasoning95
Coding90
Factual100
#8Anthropic
100
Reasoning95
Coding90
Factual100
#9OpenAI
100
Reasoning95
Coding90
Factual100
#10OpenAI
100
Reasoning95
Coding90
Factual100
#11OpenAI
100
Reasoning95
Coding90
Factual100
#12OpenAI
100
Reasoning95
Coding90
Factual100
#13Google Gemini
100
Reasoning95
Coding90
Factual100
#14Google Gemini
100
Reasoning95
Coding90
Factual100
#15Google Gemini
100
Reasoning95
Coding90
Factual100
#16Google Gemini
100
Reasoning95
Coding90
Factual100
#17OVH AI Endpoints (GRA)
100
Reasoning95
Coding90
Factual100
#18OVH AI Endpoints (GRA)
100
Reasoning95
Coding90
Factual100
#19Anthropic
99
Reasoning94
Coding89
Factual99
#20OpenAI
99
gpt-4
Tier C
Reasoning94
Coding89
Factual99
#21OpenAI
99
Reasoning94
Coding89
Factual99
#22OpenAI
99
Reasoning94
Coding89
Factual99
#23OpenAI
99
Reasoning94
Coding89
Factual99
#24OpenAI
99
Reasoning94
Coding89
Factual99
#25OpenAI
99
Reasoning94
Coding89
Factual99
#26OpenAI
99
Reasoning94
Coding89
Factual99
#27OpenAI
99
gpt-4.1
Tier B
Reasoning94
Coding89
Factual99
#28OpenAI
99
Reasoning94
Coding89
Factual99
#29OpenAI
99
Reasoning94
Coding89
Factual99
#30OpenAI
99
Reasoning94
Coding89
Factual99
#31OpenAI
99
Reasoning94
Coding89
Factual99
#32Google Gemini
99
Reasoning94
Coding89
Factual99
#33Google Gemini
99
Reasoning94
Coding89
Factual99
#34Google Gemini
99
Reasoning94
Coding89
Factual99
#35Anthropic
99
Reasoning94
Coding89
Factual99
#36OpenAI
98
Reasoning93
Coding88
Factual98
#37OpenAI
98
gpt-4o
Tier C
Reasoning93
Coding88
Factual98
#38OpenAI
98
Reasoning93
Coding88
Factual98
#39OpenAI
98
Reasoning93
Coding88
Factual98
#40OpenAI
98
Reasoning93
Coding88
Factual98
#41OpenAI
97
Reasoning92
Coding87
Factual97
#42OVH AI Endpoints (GRA)
97
Reasoning92
Coding87
Factual97
#43OVH AI Endpoints (GRA)
97
Reasoning92
Coding87
Factual97
#44OVH AI Endpoints (GRA)
97
Reasoning92
Coding87
Factual97
#45OVH AI Endpoints (GRA)
97
Reasoning92
Coding87
Factual97
#46OpenAI
95
Reasoning90
Coding86
Factual95
#47OpenAI
95
Reasoning90
Coding86
Factual95
#48OpenAI
95
Reasoning90
Coding86
Factual95
#49Google Gemini
95
Reasoning90
Coding86
Factual95
#50OVH AI Endpoints (GRA)
95
Reasoning90
Coding86
Factual95
#51OVH AI Endpoints (GRA)
92
Reasoning87
Coding83
Factual92
#52OpenAI
91
Reasoning86
Coding82
Factual91
#53Google Gemini
90
Reasoning86
Coding81
Factual90
#54Google Gemini
88
Reasoning84
Coding79
Factual88
#55OVH AI Endpoints (GRA)
85
Reasoning81
Coding77
Factual85
#56OpenAI
72
Reasoning68
Coding65
Factual72
#57Google Gemini
51
Reasoning48
Coding46
Factual51
#58OVH AI Endpoints (GRA)
50
Reasoning48
Coding45
Factual50
#59Google Gemini
43
Reasoning41
Coding39
Factual43
#60Google Gemini
40
Reasoning38
Coding36
Factual40
#61Google Gemini
40
Reasoning38
Coding36
Factual40
#62Google Gemini
35
Reasoning33
Coding32
Factual35
#63Google Gemini
35
Reasoning33
Coding32
Factual35
#64Google Gemini
30
Reasoning29
Coding27
Factual30
#65Google Gemini
25
Reasoning24
Coding23
Factual25
#66Google Gemini
0
Reasoning0
Coding0
Factual0
#67OVH AI Endpoints (GRA)
0
Reasoning0
Coding0
Factual0
#68OVH AI Endpoints (GRA)
0
Reasoning0
Coding0
Factual0

68 models scored · category breakdown estimated (full per-category scoring in Q3 2026)