Skip to content
Runs in:FranceMade in:United States
OVH AI Endpoints (GRA)

Llama-3.1-8B-Instruct

Tokonomix Editorial Team·Reviewed by Mes Kalkan··
Section 01

Speed analysis

Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.

P50 latency (median)P95 latency73 runs
42790715771236363150005-2806-15ms
Section 02

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

100
Coding
97
Multilingual
100
Reasoning
Section 03

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰
API rates — Llama-3.1-8B-Instruct
$0.1000 per 1M input tokens
$0.1000 per 1M output tokens
≈ <$0.0001 per typical conversation (800 tokens)
Input vs output price (per 1M tokens)
per 1M input tokens$0.1000
per 1M output tokens$0.1000

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.1000

input / 1M

— stable

$0.1000

output / 1M

— stable

2026-06-142026-06-142026-06-14
Input
Output
Price change
⟳ synced weekly
Section 04

Tokens per second

Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.

Throughput (tokens / s)1538 / avg 1872
46815

Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.

Section 05

Capabilities

ownedBy: meta-llama
Section 06

Availability

Availability

No measurements yet

We haven't recorded enough API calls to show availability stats for this model. Data appears once the model starts receiving live traffic.

Section 07

Tokonomix benchmark verdicts

⚖️
Endorsed by 1 judge
Independent LLM judges evaluated this model on our weekly intelligence tests
claude-sonnet-4-589/100 · 8 runs
6 correct2 partial0 wrong75% accuracy
2026-06-14

No performance data available in current benchmark window

The current benchmark window shows no test runs or performance data for Llama-3.1-8B-Instruct by OVH AI Endpoints. This represents a complete absence of measurable results compared to the previous window, which recorded an overall quality score of 95.0 out of 100 with strong multilingual performance at 95 points and a p50 latency of 12823 milliseconds. Without current data, it is impossible to assess whether the model maintains its previous performance levels or has experienced changes in quality, latency, or reliability. The lack of test runs could indicate service availability issues, endpoint configuration changes, or gaps in benchmark coverage during this measurement period. Users should be aware that the previous benchmark established a baseline showing capable performance, particularly in multilingual tasks. However, the absence of current validation data means there is no recent confirmation of model behavior or performance characteristics. Organizations relying on this endpoint should verify availability and conduct their own testing before deploying production workloads until new benchmark data becomes available.

Quality

Latency p50

Test runs

0

No test runs recorded No current performance data Cannot verify model availability
Last automated test
Jun 15, 2026 · 08:00 UTC · Speed benchmark
P50 latency
130 ms
P95 latency
232 ms
Errors
0 / 6 runs
Last reviewed by Tokonomix Team·June 15, 2026