Runs in:FranceMade in:United States

Archived

This model has been discontinued by the provider. Historical data is preserved.

No longer available since June 28, 2026.

OVH AI Endpoints (GRA)

Llama-3.1-8B-Instruct

Tokonomix Editorial Team·Reviewed by Mes Kalkan·Published May 27, 2026·Last reviewed June 28, 2026

Section 01

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰

API rates — Llama-3.1-8B-Instruct

$0.1000 per 1M input tokens

$0.1000 per 1M output tokens

≈ <$0.0001 per typical conversation (800 tokens)

Input vs output price (per 1M tokens)

per 1M input tokens$0.1000

per 1M output tokens$0.1000

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.1000

input / 1M

— stable

$0.1000

output / 1M

— stable

2026-06-142026-06-142026-06-21

Input

Output

Price change

⟳ synced weekly

Section 02

Capabilities

ownedBy: meta-llama

Section 03

Availability

No measurements yet

We haven't recorded enough API calls to show availability stats for this model. Data appears once the model starts receiving live traffic.

Section 04

Tokonomix benchmark verdicts

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-586/100 · 23 runs

15 correct7 partial1 wrong65% accuracy

● 2026-06-21

Quality drops 29 points as performance degrades across all categories

Llama-3.1-8B-Instruct by OVH AI Endpoints has experienced a significant decline in performance this benchmark window. The overall quality score plummeted from 99.0 to 70.3, representing a 28.7-point drop that affects the model's competitive standing. The degradation is evident across all measured categories, with factual accuracy scoring just 57, reasoning at 74, and multilingual capabilities at 80. This contrasts sharply with the previous window where coding achieved 100, multilingual scored 97, and reasoning reached 100. The current window shows a different category composition, making direct comparisons complex, but the overall trend is unmistakably negative. On a positive note, latency has improved slightly from 9119ms to 7942ms at the median, offering users marginally faster response times. However, this speed gain is overshadowed by the substantial quality regression. Testing consistency remains stable with five runs in both windows. Users relying on this endpoint should be aware of the current performance limitations, particularly for fact-dependent tasks where the model now scores below 60. The cause of this regression warrants investigation to determine whether it stems from infrastructure changes, model configuration, or other factors.

Quality

70.3

Latency p50

7,942 ms

Test runs

✗ Quality dropped 29 points✗ Factual accuracy now only 57✓ Latency improved to 7942ms✗ Reasoning declined significantly

Last automated test

Jun 28, 2026 · 05:12 UTC · Benchmark

P50 latency

—

P95 latency

—

Errors

1 / 6 runs

Last reviewed by Tokonomix Team·June 28, 2026