Fonctionne en :FranceCréé en :United States

Archivé

Ce modèle a été retiré par le fournisseur. Les données historiques sont conservées.

Plus disponible depuis le 28 juin 2026.

OVH AI Endpoints (GRA)

Llama-3.1-8B-Instruct

Équipe éditoriale Tokonomix·Relu par Mes Kalkan·Publié le 27 mai 2026·Dernière relecture 28 juin 2026

Section 01

Historique des tarifs

Tarifs directs du fournisseur par million de tokens, plus une estimation du coût d'une conversation typique.

💰

Tarifs API — Llama-3.1-8B-Instruct

$0.1000 par 1M de tokens d'entrée

$0.1000 par 1M de tokens de sortie

≈ <$0.0001 par conversation typique (800 tokens)

Prix entrée vs sortie (par 1M de tokens)

par 1M de tokens d'entrée$0.1000

par 1M de tokens de sortie$0.1000

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.1000

input / 1M

— stable

$0.1000

output / 1M

— stable

2026-06-142026-06-142026-06-21

Input

Output

Price change

⟳ synced weekly

Section 02

Capacités

ownedBy: meta-llama

Section 03

Disponibilité

Pas encore de données

Nous n'avons pas encore enregistré suffisamment d'appels API pour afficher les statistiques de disponibilité de ce modèle. Les données apparaîtront dès que le modèle reçoit du trafic en direct.

Section 04

Verdicts benchmark Tokonomix

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-586/100 · 23 runs

15 correct7 partial1 wrong65% accuracy

● 2026-06-21

Quality drops 29 points as performance degrades across all categories

Llama-3.1-8B-Instruct by OVH AI Endpoints has experienced a significant decline in performance this benchmark window. The overall quality score plummeted from 99.0 to 70.3, representing a 28.7-point drop that affects the model's competitive standing. The degradation is evident across all measured categories, with factual accuracy scoring just 57, reasoning at 74, and multilingual capabilities at 80. This contrasts sharply with the previous window where coding achieved 100, multilingual scored 97, and reasoning reached 100. The current window shows a different category composition, making direct comparisons complex, but the overall trend is unmistakably negative. On a positive note, latency has improved slightly from 9119ms to 7942ms at the median, offering users marginally faster response times. However, this speed gain is overshadowed by the substantial quality regression. Testing consistency remains stable with five runs in both windows. Users relying on this endpoint should be aware of the current performance limitations, particularly for fact-dependent tasks where the model now scores below 60. The cause of this regression warrants investigation to determine whether it stems from infrastructure changes, model configuration, or other factors.

Quality

70.3

Latency p50

7,942 ms

Test runs

✗ Quality dropped 29 points✗ Factual accuracy now only 57✓ Latency improved to 7942ms✗ Reasoning declined significantly

Dernier test automatisé

28 juin 2026 · 05:12 UTC · Benchmark

Latence P50

—

Latence P95

—

Erreurs

1 / 6 exécutions

Dernière revue par Équipe Tokonomix·28 juin 2026