Google Ships Gemini 3.5 Flash, a Cheap-to-Run Agent Model That Costs 3x More Per Token

Google released Gemini 3.5 Flash into general availability at its I/O developer conference on May 19, 2026, making an unusual claim for a model in its lower-cost tier: that it beats the company's own flagship, Gemini 3.1 Pro, on most coding and agentic benchmarks while running several times faster. The model is the first in a new 3.5 family; a Gemini 3.5 Pro version is in internal use and slated for next month.

The release lands the same day across the Gemini API, Google AI Studio, Antigravity, Vertex AI/Gemini Enterprise, the Gemini app and AI Mode in Search, and is now the default model in the Gemini app and in AI Mode globally. Immediate availability on launch day is uncommon for a frontier-class model and lets developers deploy it in production without a preview period.

What Google is actually claiming

DeepMind chief technologist Koray Kavukcuoglu told reporters the model "outperforms our latest frontier model, 3.1 Pro, on nearly all the benchmarks," including coding, agentic tasks and multimodal reasoning, and runs four times faster than comparable frontier models, according to TechCrunch. He added that Google built an optimized variant that is 12 times faster at the same quality — a more striking figure than the throughput numbers circulating in early write-ups.

On the public benchmark slide Google released, 3.5 Flash scores 76.2% on Terminal-Bench 2.1 versus 70.3% for 3.1 Pro, 83.6% on MCP Atlas versus 78.2%, and 1,656 Elo on GDPval-AA versus 1,317, a roughly 340-point jump on a benchmark meant to track economically valuable work. Independent evaluator Artificial Analysis, given pre-release access, scored it 55 on its Intelligence Index — nine points above Gemini 3 Flash — and measured output of about 284 tokens per second, the highest MMMU-Pro multimodal score it had recorded at 84%. CEO Sundar Pichai cited 289 tokens per second on stage; claims of "300 tokens per second" overstate the figure slightly.

The model trails 3.1 Pro on long-context retrieval and pure-knowledge tests such as Humanity's Last Exam — an expected trade, since Google tuned the model for agentic execution rather than raw recall. Configurable "thinking levels" (minimal, low, medium, high) let developers trade latency for reasoning depth, as in earlier Flash releases.

The pricing claim is backwards

The most widely repeated framing — that the Flash tier saw only "a slight increase" in price — is incorrect. Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9.00 per million output tokens, with cached input at $0.15. That is roughly a 3x increase over Gemini 3 Flash's $0.50/$3.00, and Artificial Analysis found it cost about 5.5x more to run its full benchmark suite than the previous Flash, driven by both higher token prices and more agentic turns consuming more input tokens.

Both things are true at once: the model is sharply more expensive than its own predecessor, while still landing below frontier rivals on a per-task basis. Pichai's on-stage pitch — that 3.5 Flash delivers "frontier level capabilities at less than half the price and in some cases a third of the price" — refers to competing frontier-tier models, not Google's prior Flash. The strategic backdrop, Constellation Research noted, is enterprise "sticker shock" as companies see the first real token bills for running AI agents at scale.

Antigravity and the agent pitch

Google paired the model with a relaunched Antigravity, its agent-first development platform, rebuilt as a standalone desktop application and co-developed with 3.5 Flash so that agents have, in Kavukcuoglu's words, a "native environment where they can live, work, and execute." On stage, Google engineer Varun Mohan demonstrated agents spawning sub-agents to build components in parallel and assemble a full operating system inside Antigravity, per TechCrunch. Constellation Research reported the desktop app also gained a CLI terminal and SDK.

Some demonstration anecdotes circulating with the launch — a complete plumbing-business website built error-free in under five minutes, or an entire Linux desktop emulated inside a single HTML file — are not described in Google's primary materials and should be treated as illustrative rather than verified benchmarks. What Google did show publicly is the autonomous OS-build demo and a series of agentic coding and design tasks on its model page.

Enterprise partners, with a caveat

Google's official model page confirms several named deployments: Salesforce is integrating 3.5 Flash into Agentforce to automate complex enterprise tasks using multiple context-retaining sub-agents, Shopify is running parallel sub-agents for global merchant-growth forecasting, Macquarie Bank is piloting it for onboarding over 100-plus-page financial documents, Ramp is using it for invoice OCR, and Xero is automating multi-week tax-form workflows. The Salesforce–Google Agentforce relationship is a longstanding, publicly documented partnership.

However, specific performance figures attached to these partners in some breakdowns — a Salesforce customer-service benchmark with "100% task completion, 94% action completion and 100% state completion," or a Shopify case in which the model traced a Canadian signup spike to a single French influencer campaign against an "8% accuracy threshold" — do not appear in Google's announcement, the partners' materials, or independent coverage reviewed for this article. The partnerships are real; those metrics are unverified and the "8%" threshold in particular reads as garbled. Box separately reported a 19.6% improvement over Gemini 3 Flash on its enterprise-work evaluation, one of the few partner figures Google itself published.

A redesigned app and an agent ecosystem

The consumer Gemini app received its largest overhaul yet, built around a design language Google calls Neural Expressive — fluid animations, vibrant color, new typography and haptic feedback, with responses that surface maps, photo cards and visual layouts rather than text walls, per Engadget's I/O coverage. Alongside it Google introduced Gemini Spark, a 24/7 personal agent that parses Gmail and Tasks and asks permission before high-stakes actions like spending money, rolling out to Google AI Ultra subscribers in the US; a passive Daily Brief digest; and the Gemini Omni video model. Claims of a native macOS Spark integration with filler-word-stripping dictation are not detailed in Google's primary I/O materials and remain unconfirmed.

The risk ledger

Pushing autonomous agents to mainstream consumers invites scrutiny. TechCrunch noted Google is defending a lawsuit after a man died by suicide following weeks of conversations with Gemini, and that Google says 3.5 has strengthened cyber and chemical-biological safeguards and is "better calibrated to engage with sensitive questions rather than refuse them outright." Informal community probes of the model's reasoning — modified trolley and river-crossing puzzles — circulated after launch but are not independently verified and are not reflected in Google's evaluation methodology.

The throughline is consistent with the rest of I/O 2026: Google is repositioning Gemini from a chatbot to an agent runtime, and is willing to charge more per token to do it — betting that speed and reliability at scale matter more to enterprises than the headline price.

Tags:Google Gemini

Join the Discussion

Google Ships Gemini 3.5 Flash, a Cheap-to-Run Agent Model That Costs 3x More Per Token

What Google is actually claiming

The pricing claim is backwards

Antigravity and the agent pitch

Enterprise partners, with a caveat

A redesigned app and an agent ecosystem

The risk ledger

The Mandalorian and Grogu Depicts Real Ion Propulsion: NASA's Psyche Used at Mars Two Days Before LA Premiere

Top Digital Skills Everyone Should Learn for Career Growth and Everyday Life

Anthropic's New Founder Playbook Argues AI Has "Rebooted" the Startup Lifecycle — Here's What Holds Up

Samsung Begins HBM4 Shipments as SK Hynix Lags and a 50,000-Worker Strike in Five Days Threatens the AI Chip Supply

Should You Buy a Portable SSD or External Hard Drive? Which Storage Option Is Better in 2026