Tier B — Production

Runs in:USMade in:United States

Gemini 3.1 Flash Lite

Tier B — Production · 1.048576M tokens

Tokonomix Editorial Team·Reviewed by Mes Kalkan·Published May 27, 2026·Last reviewed July 19, 2026

Section 01

Quality scores

Evaluation results from judge-model scoring across diverse task categories. Scores reflect coherence, accuracy and instruction-following.

100

Coding

100

Multilingual

Creative

Section 02

Pricing history

Direct provider rates per million tokens, plus a typical-conversation cost estimate.

💰

API rates — Gemini 3.1 Flash Lite

$0.2500 per 1M input tokens

$1.50 per 1M output tokens

≈ $0.0004 per typical conversation (800 tokens)

Input vs output price (per 1M tokens)

per 1M input tokens$0.2500

per 1M output tokens$1.50

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.2500

input / 1M

▼ −44% since first

$1.50

output / 1M

▼ −44% since first

2026-06-072026-06-282026-07-19

Input

Output

Price change

⟳ synced weekly

Section 03

Capabilities

toolssource: litellmvisionjson modepdf inputreasoningaudio inputjson schemaparallel toolsprompt cachingoutputTokenLimit: 65536max output tokens: 65536

Section 04

Availability

How often this model answers when we call it — measured across real API requests and live tests over the last 30 days. This is separate from quality: these numbers only tell you whether the model responds, not how good the answer is.

Last 7 days

100.0%

n=4

Last 30 days

100.0%

n=165

Median response time

1,274ms

n=165

Based on 185 measurements over the last 30 days.

Technical details

Only live API calls and live-test requests count — internal probes and benchmark runs are excluded.

Calls with a custom API key (BYOK) are excluded: those failures are key-specific, not a sign of model downtime.

Failed calls are NOT included in quality scores — quality is measured on successful responses only. Availability and quality are independent signals.

Median response time (p50) across successful calls with a recorded duration. Outliers (very slow or very fast calls) pull the median less than the average.

Total calls (30d)

165

OK responses (30d)

165

Total calls (7d)

OK responses (7d)

Section 05

Tokonomix benchmark verdicts

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-597/100 · 42 runs

38 correct4 partial0 wrong90% accuracy

● 2026-07-19

Quality decline across categories with reasoning performance now unmeasured

Gemini 3.1 Flash Lite shows a notable 6-point drop in overall quality score, falling from 99.3 to 93.3 out of 100 in the current benchmark window. The model maintains perfect scores in coding and multilingual tasks at 100 each, but creative performance registered at just 80, suggesting potential regression in generative capabilities. Most concerning is the complete absence of reasoning scores in the current window, despite achieving a perfect 100 in this category previously. This missing data point makes it difficult to assess whether the model has actually lost reasoning capability or if the test coverage has simply changed. Latency remains relatively stable with a marginal increase from 1408ms to 1460ms at the median, representing a 52ms degradation that should be negligible for most use cases. The consistency in test runs at 5 per window provides reasonable confidence in these measurements. Users should be aware that while specialized tasks like coding and multilingual processing remain strong, the overall reliability appears to have decreased. The missing reasoning benchmark is particularly notable given its previous perfect performance, and users relying on logical inference capabilities should exercise caution until this metric is re-established.

Quality

93.3

Latency p50

1,460 ms

Test runs

✗ Quality dropped 6 points✗ Reasoning category no longer tested✗ Creative score fell to 80✓ Coding and multilingual remain perfect

Last automated test

Jul 19, 2026 · 05:23 UTC · Benchmark

P50 latency

1310 ms

P95 latency

—

Errors

0 / 6 runs

Last reviewed by Tokonomix Team·July 19, 2026