Skip to content
Runs in:FranceMade in:United States
OVH AI Endpoints (GRA)

Meta-Llama-3_3-70B-Instruct

Tokonomix Editorial Team·Reviewed by Mes Kalkan··
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency73 runs
90794315795236483150005-2806-15ms
Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

99
Coding
97
Multilingual
100
Reasoning
Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Meta-Llama-3_3-70B-Instruct
$0.6700 per 1M input tokens
$0.6700 per 1M output tokens
≈ $0.0005 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.6700
per 1M output tokens$0.6700

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.6700

input / 1M

— stable

$0.6700

output / 1M

— stable

2026-06-142026-06-142026-06-14
Input
Output
Price change
⟳ synced weekly
Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)1575 / avg 1569
217930

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: meta-llama
Section 06

Availability

Availability

How often this model answers when we call it — measured across real API requests and live tests over the last 30 days. This is separate from quality: these numbers only tell you whether the model responds, not how good the answer is.

Last 7 days

100.0%

n=8

Last 30 days

100.0%

n=8

Median response time

7,284ms

n=8

Based on 76 measurements over the last 30 days.

Technical details

Only live API calls and live-test requests count — internal probes and benchmark runs are excluded.

Calls with a custom API key (BYOK) are excluded: those failures are key-specific, not a sign of model downtime.

Failed calls are NOT included in quality scores — quality is measured on successful responses only. Availability and quality are independent signals.

Median response time (p50) across successful calls with a recorded duration. Outliers (very slow or very fast calls) pull the median less than the average.

Total calls (30d)

8

OK responses (30d)

8

Total calls (7d)

8

OK responses (7d)

8

Section 07

Tokonomix benchmark verdicts

⚖️
Endorsed by 1 judge
Independent LLM judges evaluated this model on our weekly intelligence tests
claude-sonnet-4-593/100 · 8 runs
7 correct0 partial1 wrong88% accuracy
2026-06-14

Meta-Llama-3_3-70B-Instruct maintains 97.0 quality with stable performance

Meta-Llama-3_3-70B-Instruct continues to deliver consistent performance in its second benchmark window, maintaining its overall quality score of 97.0 out of 100. The model shows no measurable changes in quality metrics, demonstrating reliability across evaluation cycles. Latency remains at the p50 mark of 10556 milliseconds, indicating stable response times for this 70B parameter model. The multilingual category score holds steady at 97, confirming the model's continued strength in handling multiple languages effectively. With only one test run in the current window matching the previous baseline, the consistency suggests predictable behavior for production deployments. Users can expect the same high-quality outputs and performance characteristics observed in the initial benchmark period. The lack of variation between windows indicates a mature, stable offering suitable for applications requiring dependable language model performance. OVH AI Endpoints in the GRA region continues to provide reliable hosting for this model without performance degradation.

Quality

Latency p50

Test runs

0

Quality score stable at 97.0 Consistent multilingual performance
Last automated test
Jun 15, 2026 · 08:00 UTC · Speed benchmark
P50 latency
127 ms
P95 latency
172 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·June 15, 2026