Tier B — Productie

Draait in:USGemaakt in:United States

Gemini 3.1 Flash Lite

Tier B — Productie · 1.048576M tokens

Tokonomix-redactie·Gecontroleerd door Mes Kalkan·Gepubliceerd 27 mei 2026·Laatst gecontroleerd 19 juli 2026

Sectie 01

Kwaliteitsscores

Evaluatieresultaten van judge-model beoordelingen over diverse taakcategorieën. Scores weerspiegelen coherentie, accuratesse en instructieopvolging.

100

Code generatie

100

Meertaligheid

Creatief

Sectie 02

Prijsgeschiedenis

Directe provider-tarieven per miljoen tokens, plus een typische gespreks-kostschatting.

💰

API-tarieven — Gemini 3.1 Flash Lite

$0.2500 per 1M input-tokens

$1.50 per 1M output-tokens

≈ $0.0004 per typisch gesprek (800 tokens)

Input vs output prijs (per 1M tokens)

per 1M input-tokens$0.2500

per 1M output-tokens$1.50

Pricing over time

Input & output per 1M tokens · step-line = price changes

$0.2500

input / 1M

▼ −44% since first

$1.50

output / 1M

▼ −44% since first

2026-06-072026-06-282026-07-19

Input

Output

Price change

⟳ synced weekly

Sectie 03

Mogelijkheden

toolssource: litellmvisionjson modepdf inputreasoningaudio inputjson schemaparallel toolsprompt cachingoutputTokenLimit: 65536max output tokens: 65536

Sectie 04

Beschikbaarheid

Hoe vaak dit model antwoordt als we het aanroepen — gemeten over echte API-aanvragen en live-tests in de afgelopen 30 dagen. Dit staat los van kwaliteit: deze cijfers laten alleen zien of het model reageert, niet hoe goed het antwoord is.

Afgelopen 7 dagen

100.0%

n=4

Afgelopen 30 dagen

100.0%

n=165

Mediane responstijd

1,274ms

n=165

Gebaseerd op 185 metingen in de afgelopen 30 dagen.

Technische details

Alleen echte API-aanroepen en live-testverzoeken tellen mee — interne probes en benchmarkruns zijn uitgesloten.

Aanroepen met een eigen API-sleutel (BYOK) zijn uitgesloten: die fouten zijn sleutelspecifiek en geen teken van modelneergang.

Mislukte aanroepen worden NIET meegeteld in kwaliteitsscores — kwaliteit wordt gemeten op geslaagde responses. Beschikbaarheid en kwaliteit zijn onafhankelijke signalen.

Mediane responstijd (p50) over geslaagde aanroepen met een vastgelegde duur. Uitschieters trekken de mediaan minder dan het gemiddelde.

Totaal aanroepen (30d)

165

OK-reacties (30d)

165

Totaal aanroepen (7d)

OK-reacties (7d)

Sectie 05

Tokonomix benchmark-oordelen

⚖️

Endorsed by 1 judge

Independent LLM judges evaluated this model on our weekly intelligence tests

claude-sonnet-4-597/100 · 42 runs

38 correct4 partial0 wrong90% accuracy

● 2026-07-19

Quality decline across categories with reasoning performance now unmeasured

Gemini 3.1 Flash Lite shows a notable 6-point drop in overall quality score, falling from 99.3 to 93.3 out of 100 in the current benchmark window. The model maintains perfect scores in coding and multilingual tasks at 100 each, but creative performance registered at just 80, suggesting potential regression in generative capabilities. Most concerning is the complete absence of reasoning scores in the current window, despite achieving a perfect 100 in this category previously. This missing data point makes it difficult to assess whether the model has actually lost reasoning capability or if the test coverage has simply changed. Latency remains relatively stable with a marginal increase from 1408ms to 1460ms at the median, representing a 52ms degradation that should be negligible for most use cases. The consistency in test runs at 5 per window provides reasonable confidence in these measurements. Users should be aware that while specialized tasks like coding and multilingual processing remain strong, the overall reliability appears to have decreased. The missing reasoning benchmark is particularly notable given its previous perfect performance, and users relying on logical inference capabilities should exercise caution until this metric is re-established.

Quality

93.3

Latency p50

1,460 ms

Test runs

✗ Quality dropped 6 points✗ Reasoning category no longer tested✗ Creative score fell to 80✓ Coding and multilingual remain perfect

Laatste automatische test

19 jul 2026 · 05:23 UTC · Benchmark

P50 latency

1310 ms

P95 latency

—

Fouten

0 / 6 runs

Laatst beoordeeld door Tokonomix-team·19 juli 2026