Llama-3.1-8B-Instruct
Speed analysis
Latency measured across all benchmark runs. P50 (median) and P95 (95th percentile) give a realistic picture of response speed under normal and peak load.
Quality scores
Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.
Pricing history
Direct provider rates per million tokens, plus a typical-conversation cost estimate.
Pricing over time
Input & output per 1M tokens · step-line = price changes
$0.1000
input / 1M
— stable
$0.1000
output / 1M
— stable
Tokens per second
Throughput in tokens per second, derived from measured P50 latency. Higher is better; fluctuations track provider-side load.
Estimated from P50 latency × 200 output tokens — the absolute number depends on this assumption; the trend is what matters.
Capabilities
Availability
Availability
No measurements yet
We haven't recorded enough API calls to show availability stats for this model. Data appears once the model starts receiving live traffic.
Tokonomix benchmark verdicts
No performance data available in current benchmark window
The current benchmark window shows no test runs or performance data for Llama-3.1-8B-Instruct by OVH AI Endpoints. This represents a complete absence of measurable results compared to the previous window, which recorded an overall quality score of 95.0 out of 100 with strong multilingual performance at 95 points and a p50 latency of 12823 milliseconds. Without current data, it is impossible to assess whether the model maintains its previous performance levels or has experienced changes in quality, latency, or reliability. The lack of test runs could indicate service availability issues, endpoint configuration changes, or gaps in benchmark coverage during this measurement period. Users should be aware that the previous benchmark established a baseline showing capable performance, particularly in multilingual tasks. However, the absence of current validation data means there is no recent confirmation of model behavior or performance characteristics. Organizations relying on this endpoint should verify availability and conduct their own testing before deploying production workloads until new benchmark data becomes available.
Quality
—
Latency p50
—
Test runs
0
Llama-3.1-8B-Instruct
by OVH AI Endpoints (GRA)
- Context window
- — tokens
- Input price
- $0.1000 / 1M
- Output price
- $0.1000 / 1M
- Tier
- —
- Modality
- Text
- API type
- REST · streaming
- Benchmark runs
- 91
More from OVH AI Endpoints (GRA)