Läuft in:FranceErstellt in:United States

Archiviert

Dieses Modell wurde vom Anbieter eingestellt. Historische Daten bleiben erhalten.

Seit 28. Juni 2026 nicht mehr verfügbar.

OVH AI Endpoints (GRA)

Llama-3.1-8B-Instruct

Tokonomix-Redaktionsteam·Geprüft von Mes Kalkan·Veröffentlicht 27. Mai 2026·Zuletzt geprüft 28. Juni 2026

Abschnitt 01

Preisverlauf

Direkte Provider-Tarife pro Million Tokens, plus eine typische Gesprächskostenschätzung.

💰

API-Tarife — Llama-3.1-8B-Instruct

$0.1000 pro 1M Input-Tokens

$0.1000 pro 1M Output-Tokens

≈ <$0.0001 pro typischem Gespräch (800 Tokens)

Input- vs. Output-Preis (pro 1M Tokens)

pro 1M Input-Tokens$0.1000

pro 1M Output-Tokens$0.1000

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.1000

input / 1M

— stable

$0.1000

output / 1M

— stable

2026-06-142026-06-142026-06-21

Input

Output

Price change

⟳ synced weekly

Abschnitt 02

Fähigkeiten

ownedBy: meta-llama

Abschnitt 03

Verfügbarkeit

Noch keine Messdaten

Es wurden noch nicht genug API-Aufrufe aufgezeichnet, um Verfügbarkeitsstatistiken für dieses Modell anzuzeigen. Daten erscheinen, sobald das Modell Live-Traffic erhält.

Abschnitt 04

Tokonomix-Benchmark-Urteile

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-586/100 · 23 runs

15 correct7 partial1 wrong65% accuracy

● 2026-06-21

Quality drops 29 points as performance degrades across all categories

Llama-3.1-8B-Instruct by OVH AI Endpoints has experienced a significant decline in performance this benchmark window. The overall quality score plummeted from 99.0 to 70.3, representing a 28.7-point drop that affects the model's competitive standing. The degradation is evident across all measured categories, with factual accuracy scoring just 57, reasoning at 74, and multilingual capabilities at 80. This contrasts sharply with the previous window where coding achieved 100, multilingual scored 97, and reasoning reached 100. The current window shows a different category composition, making direct comparisons complex, but the overall trend is unmistakably negative. On a positive note, latency has improved slightly from 9119ms to 7942ms at the median, offering users marginally faster response times. However, this speed gain is overshadowed by the substantial quality regression. Testing consistency remains stable with five runs in both windows. Users relying on this endpoint should be aware of the current performance limitations, particularly for fact-dependent tasks where the model now scores below 60. The cause of this regression warrants investigation to determine whether it stems from infrastructure changes, model configuration, or other factors.

Qualität

70.3

Latenz p50

7,942 ms

Testläufe

✗ Quality dropped 29 points✗ Factual accuracy now only 57✓ Latency improved to 7942ms✗ Reasoning declined significantly

Letzter automatisierter Test

28. Juni 2026 · 05:12 UTC · Benchmark

P50-Latenz

—

P95-Latenz

—

Fehler

1 / 6 Läufe

Zuletzt geprüft von Tokonomix-Team·28. Juni 2026