Benchmarks
Conjunto de datos público
Raw benchmark data available for free. No API key required for read access. Use this data in your own tools, dashboards, or research.
225
Models tracked
131 active
12
Providers
active APIs
23434
Benchmark runs
all time
0
Test questions
Q3 2026
Download
Full benchmark dataset as JSON — models, providers, and most recent run per model. Updated every 6 hours. CORS-open for browser fetch.
Download JSON →GET /api/md/es/dataset
The /api/md/[lang]/dataset endpoint returns the full benchmark dataset as JSON.
Schema
benchmark_runs
| Field | Type | Description |
|---|---|---|
| id | bigint | Unique run ID |
| model_id | bigint | FK → models.id |
| run_type | varchar(20) | "speed" | "intelligence" | "health" |
| started_at | timestamptz | Run start time (UTC) |
| ended_at | timestamptz | Run end time (UTC) |
| latency_p50_ms | integer | Median latency (ms) — null if not applicable |
| latency_p95_ms | integer | 95th-percentile latency (ms) |
| quality_score | integer | Judge score 0–100 — null until Q3 2026 |
| error_count | integer | API errors in this run |
| raw_data | jsonb | Provider-specific response payload |
| created_at | timestamptz | Row creation time (UTC) |
models
| Field | Type | Description |
|---|---|---|
| id | bigint | Unique model ID |
| provider_id | bigint | FK → providers.id |
| slug | varchar(100) | URL-safe identifier (e.g. claude-sonnet-4-6) |
| name | varchar(200) | Display name |
| parameter_size | varchar(20) | e.g. "70B", "unknown" |
| context_window | integer | Max context in tokens |
| price_input_per_1m_cents | integer | Input price in cents per 1M tokens |
| price_output_per_1m_cents | integer | Output price in cents per 1M tokens |
| tier | varchar(2) | "A" | "B" | "C" — content priority tier |
| is_active | boolean | Whether model is currently tested |