Draait in:FranceGemaakt in:United States

Gearchiveerd

Dit model is door de aanbieder uit productie genomen. Historische data blijft bewaard.

Niet meer beschikbaar sinds 28 juni 2026.

OVH AI Endpoints (GRA)

Llama-3.1-8B-Instruct

Tokonomix-redactie·Gecontroleerd door Mes Kalkan·Gepubliceerd 27 mei 2026·Laatst gecontroleerd 28 juni 2026

Sectie 01

Prijsgeschiedenis

Directe provider-tarieven per miljoen tokens, plus een typische gespreks-kostschatting.

💰

API-tarieven — Llama-3.1-8B-Instruct

$0.1000 per 1M input-tokens

$0.1000 per 1M output-tokens

≈ <$0.0001 per typisch gesprek (800 tokens)

Input vs output prijs (per 1M tokens)

per 1M input-tokens$0.1000

per 1M output-tokens$0.1000

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.1000

input / 1M

— stable

$0.1000

output / 1M

— stable

2026-06-142026-06-142026-06-21

Input

Output

Price change

⟳ synced weekly

Sectie 02

Mogelijkheden

ownedBy: meta-llama

Sectie 03

Beschikbaarheid

Nog geen meetdata

Er zijn nog niet genoeg API-aanroepen geregistreerd om beschikbaarheidsstatistieken voor dit model te tonen. Data verschijnt zodra het model live verkeer ontvangt.

Sectie 04

Tokonomix benchmark-oordelen

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-586/100 · 23 runs

15 correct7 partial1 wrong65% accuracy

● 2026-06-21

Quality drops 29 points as performance degrades across all categories

Llama-3.1-8B-Instruct by OVH AI Endpoints has experienced a significant decline in performance this benchmark window. The overall quality score plummeted from 99.0 to 70.3, representing a 28.7-point drop that affects the model's competitive standing. The degradation is evident across all measured categories, with factual accuracy scoring just 57, reasoning at 74, and multilingual capabilities at 80. This contrasts sharply with the previous window where coding achieved 100, multilingual scored 97, and reasoning reached 100. The current window shows a different category composition, making direct comparisons complex, but the overall trend is unmistakably negative. On a positive note, latency has improved slightly from 9119ms to 7942ms at the median, offering users marginally faster response times. However, this speed gain is overshadowed by the substantial quality regression. Testing consistency remains stable with five runs in both windows. Users relying on this endpoint should be aware of the current performance limitations, particularly for fact-dependent tasks where the model now scores below 60. The cause of this regression warrants investigation to determine whether it stems from infrastructure changes, model configuration, or other factors.

Kwaliteit

70.3

Latency p50

7,942 ms

Testruns

✗ Quality dropped 29 points✗ Factual accuracy now only 57✓ Latency improved to 7942ms✗ Reasoning declined significantly

Laatste automatische test

28 jun 2026 · 05:12 UTC · Benchmark

P50 latency

—

P95 latency

—

Fouten

1 / 6 runs

Laatst beoordeeld door Tokonomix-team·28 juni 2026