Benchmarks
Methodology
How Tokonomix measures AI model performance. No vendor influence. No sponsored results. Transparent methodology, open data.
Speed
How fast does the model respond? We measure time-to-last-token for a fixed-length output prompt.
Intelligence
How accurate and capable is the model? A judge LLM rates answers across 6 categories on a 0–100 scale.
Health
Is the API available? We check every 6 hours and track error rates and availability windows.
Speed Benchmark
Prompt: A fixed instruction targeting approximately 500 tokens of output. The same prompt is used for every model in every run cycle.
Runs: 3 sequential calls per test cycle. We measure end-to-end latency (first byte to last byte), not TTFT.
Metrics: P50 (median) and P95 (tail) across the 3 runs. P50 is the headline number; P95 reveals consistency.
Measurement location: EU — Amsterdam (AMS). All results are EU-latency. US or Asia results would differ.
Speed tiers:
Intelligence Benchmark
Judge model: Claude Sonnet 4.6 acts as an impartial judge. Model names are never included in the judge prompt — only the response text is evaluated (blind review).
Six scoring categories (0–100 each):
Overall quality score: Weighted average of the six categories. Weights: Reasoning 25%, Coding 25%, Factual 20%, Instruction-following 15%, Creativity 10%, Safety 5%.
Intelligence benchmarks are currently in development — expected Q3 2026.
Health Check
Frequency: Every 6 hours (06:00, 12:00, 18:00, 00:00 UTC).
Method: A minimal echo-style prompt is sent. We track HTTP status, error message (if any), and response time.
Error tracking: error_count per run is recorded. Sustained high error rates are surfaced on the leaderboard.
Run Schedule
All times UTC. Intelligence benchmarks run once per day (06:00 UTC) when active. Data freshness is always displayed next to each benchmark result.
FAQ
Are you affiliated with any AI provider?+
Why EU latency only?+
How do you handle API cost?+
Can I download the raw data?+
Is the judge-LLM fair to all models?+
Data API
All benchmark data is available for free. No key required for read-only access.