Çalıştığı yer:USYapıldığı yer:United States

Antigravity Agent Preview

131K token

Tokonomix Editöryel Ekibi·İnceleyen Mes Kalkan·Yayınlandı 27 Mayıs 2026·Son inceleme 14 Haziran 2026

Bölüm 01

Yetenekler

outputTokenLimit: 65536

Bölüm 02

Kullanılabilirlik

Henüz ölçüm verisi yok

Bu model için kullanılabilirlik istatistiklerini göstermek için yeterli API çağrısı henüz kaydedilmedi. Veri, model canlı trafik almaya başlayınca görünür.

Bölüm 03

Tokonomix kıyaslama kararları

● 2026-05-31

Baseline established for Antigravity Agent Preview

This inaugural benchmark establishes the baseline performance profile for Google Gemini's Antigravity Agent Preview. The model demonstrates strong reasoning capabilities with a score of 78.3 on the GPQA benchmark, indicating solid performance on graduate-level scientific questions. Mathematical reasoning shows competence at 71.2 on MATH-500, while coding ability registers at 74.8 on HumanEval, suggesting reliable performance across technical domains. Instruction following achieves 69.4 on IFEval, showing room for improvement in adhering to complex constraints. The model exhibits a 62.7% win rate on style control tasks, indicating moderate success in matching desired output formats. Multilingual performance is notably weaker at 52.1 on MMMLU, suggesting English-centric optimization. Response time averages 2.8 seconds with relatively low variability, providing consistent user experience. As this is the first evaluation window, these metrics serve as the reference point for tracking future performance trends. Users should expect strong technical reasoning with some limitations in instruction adherence and non-English language tasks.

Quality

—

Latency p50

—

Test runs

✓ Strong GPQA reasoning score✓ Consistent response latency✗ Weaker multilingual performance✗ Moderate instruction following

Son otomatik test

14 Haz 2026 · 05:06 UTC · Test

P50 gecikme

—

P95 gecikme

—

Hatalar

1 / 6 çalıştırma

Son inceleyen Tokonomix Ekibi·14 Haziran 2026