Google Launches Gemini Omni Video Model, but Holds Back Its Riskiest Feature

Gemini Omni
Gemini Omni Google.com

Google introduced Gemini Omni, a multimodal model that generates and edits video from almost any input, at its I/O developer conference on May 19, 2026, moving the company's generative-video effort out of the standalone Veo line and into the core Gemini system. The first model in the family, Gemini Omni Flash, began rolling out the same day.

In a blog post published by Google DeepMind CTO Koray Kavukcuoglu, the company described Omni as a model that can "create anything from any input — starting with video." Users can combine images, audio, video and text in a single prompt; rather than stitching those inputs together, the model reasons across them to produce one output and then accepts further changes through conversation. Google says the system is "grounded in Gemini's real-world knowledge," and that characters, physics and prior edits persist across multiple turns of instructions.

The framing borrows directly from Google's image-editing model. Just as Nano Banana brought conversational editing to still images last year, Omni is positioned as the video equivalent — part of a broader push to collapse Google's separate image, video and text pipelines into one Gemini-native surface.

Distribution and pricing

Gemini Omni Flash is rolling out to Google AI Plus, Pro and Ultra subscribers worldwide through the Gemini app and the Flow creative tool, according to Google's blog. It is also being made available at no cost to YouTube Shorts and YouTube Create App users starting this week. Google said developer and enterprise access through an API would follow "in the coming weeks."

Speaking to TechCrunch, Google DeepMind product management director Nicole Brichtova said Flash clips are capped at 10 seconds, describing the limit as a deployment decision rather than a model constraint — a way to widen access while compute demand is high, and a bet that most users do not yet want longer clips. Brichtova also confirmed a higher-end Omni Pro model is planned, with no release date, to arrive when Google sees "a step change above Flash."

Several technical claims circulating alongside the launch are not confirmed by Google. A widely shared explainer of the model describes Omni Flash output as capped at 720p and quotes generation times of roughly 60 to 90 seconds per clip, but neither figure appears in Google's official materials or in Brichtova's on-record comments; pre-launch coverage treated resolution as unconfirmed. Similarly, descriptions of a fixed seven-image avatar setup and named template packs ("Metallic," "Meme Me," "Indie Pastel") do not appear in Google's announcement and should be treated as unverified until the company documents them.

The avatar feature — and a deliberate gap

Omni lets users build a digital avatar that "looks and sounds like" them. Brichtova told TechCrunch the onboarding requires recording yourself and speaking a series of numbers aloud; the avatar is then stored for reuse, an anti-deepfake step modeled loosely on the Cameos feature from OpenAI's now-discontinued Sora app, which OpenAI shut down earlier this year.

The more important detail is what Google is not shipping. The company's blog states plainly that, beyond the avatar feature, editing videos to change audio and speech is something it is "still working to test" so it can "better understand how we can bring this capability to users responsibly." That distinction matters: demonstrations that show Omni transforming a person into an animal while preserving their original voice, or swapping speech in existing footage, describe a capability Google has explicitly held back from this release rather than a feature available today. Reporting from Yahoo Tech corroborates that voice and speech editing remains in testing.

What Google did demonstrate publicly is more constrained: editing actions and objects in user-shot footage, style transfer between realistic and animated looks, multi-turn refinement, and explainer-style generation. Its blog includes worked prompts for a claymation protein-folding explainer and a 26-item alphabet sequence — both consistent with the educational use cases the document highlights, though Google frames them as illustrative examples rather than benchmarked results.

Competitive and safety context

Omni lands in the most contested corner of generative AI. ByteDance's Seedance 2.0 has led public quality benchmarks, and Kling 3.0 remains dominant in the Chinese market, per pre-launch analysis; independent testers have suggested Flash's raw generation quality may trail those competitors even if its conversational editing is stronger. Google's strategic edge is distribution: Omni ships inside Search, the Gemini app, Flow and YouTube rather than as a standalone product. It was announced alongside Gemini 3.5 Flash and the Gemini Spark agent, part of a broad agentic push across Google's services.

On provenance, every Omni video carries Google's imperceptible SynthID watermark, verifiable through the Gemini app, Gemini in Chrome and Google Search. Google said at I/O that SynthID has now marked more than 100 billion AI-generated images and videos, and that OpenAI, ElevenLabs and Kakao are adopting the standard. Claims that Omni also embeds a visible corner watermark on every clip, or that the launch includes a specific C2PA metadata partnership, are not stated in Google's announcement and could not be independently confirmed.

The net picture: Google has shipped a capable, deeply distributed multimodal video model and paired it with provenance tooling — while consciously deferring the deepfake-adjacent voice-editing capability that would make it most powerful, and most dangerous.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Tags:Gemini
Join the Discussion