DeepMind World Models Train Robots in Imagined Worlds: SIMA Practices Inside Genie 3 Model

SIMA 2 self-improves in Genie 3 simulations without human data, a path to robot training at scale

Genie 3
Genie 3 deepmind.google

Google DeepMind has paired two of its newest artificial intelligence systems into a single training engine: a world model called Genie 3 that turns a text prompt into a navigable 3D environment, and a generalist agent called SIMA 2 that is dropped into those environments to learn how to act. The reason this matters to anyone tracking robotics is concrete: in November 2025, DeepMind showed SIMA 2 not just operating inside Genie-generated worlds but getting better at tasks there without any new human-supplied examples — the clearest evidence yet that AI agents can be trained inside worlds another AI invents, rather than in the slow, costly physical world.

DeepMind CEO Demis Hassabis has a name for the arrangement: the "Infinite Training Loop." It is the company's answer to a problem that has stalled general-purpose robots for years, and it is now backed by a published self-improvement result rather than a slide deck.

Genie 3 Generates Worlds at 24 Frames per Second; SIMA 2 Learns Inside Them

The two models play opposite roles. Genie 3, which DeepMind announced on August 5, 2025, is a general-purpose world model: given a text prompt, it generates a world a user can navigate in real time at 24 frames per second, holding consistency "for a few minutes at a resolution of 720p," per DeepMind's technical write-up. A world model, in the lab's definition, is a system that simulates how an environment evolves and how an agent's actions change it. DeepMind calls world models "a key stepping stone on the path to AGI, since they make it possible to train AI agents in an unlimited curriculum of rich simulation environments."

SIMA — short for Scalable Instructable Multiworld Agent — is the learner that consumes that curriculum. DeepMind first introduced SIMA in March 2024 as an agent that followed basic language instructions across commercial video games, operating as a person would: reading the screen and using a virtual keyboard and mouse, with no access to a game's underlying code. SIMA 2, unveiled November 13, 2025, embeds a Gemini model as the agent's core so it can reason about a high-level goal, converse with a user, and carry out multi-step instructions.

How the Teacher-Student Loop Works: Genie Invents the Curriculum, Gemini Scores the Agent

The mechanism is what separates this from a conventional simulator. In a hand-built game engine, every environment, object, and rule must be authored in advance. Genie 3 instead generates each frame autoregressively from a world description and the agent's actions, which is why DeepMind can describe its spatial consistency as "an emergent capability" rather than an explicit 3D model of the kind used by techniques such as NeRFs or Gaussian Splatting. That distinction is the engineering core of the loop: because the worlds are generated rather than authored, there is no practical ceiling on how many distinct environments an agent can be handed.

SIMA 2's side of the loop is equally specific. After learning from human demonstration videos with language labels, the agent can switch to learning in a new game "exclusively through self-directed play," DeepMind says. A separate Gemini model proposes tasks and estimates a reward for the agent's behavior; those self-generated experiences are banked and used to train the next, more capable generation of the agent. In DeepMind's words, the process lets the agent "improve on previously failed tasks entirely independently of human-generated demonstrations and intervention." The company reported that it "was even able to leverage SIMA 2's capacity for self-improvement in newly created Genie environments — a major milestone toward training general agents across diverse, generated worlds."

The robotics trade outlet Humanoids Daily, citing Hassabis, framed the architecture plainly: Genie 3 is the "Teacher" generating diverse interactive worlds on the fly, and SIMA is the "Student" practicing tasks such as navigating rooms inside them — a virtual boot camp before an agent ever touches physical hardware.

Why Generated Worlds Solve Robotics' Data Bottleneck

The loop exists to attack a single constraint. Large language models train on internet-scale text; robotics has no equivalent reservoir of physical-interaction data, and collecting it on real machines is slow, expensive, and occasionally hazardous. Generating practice environments rather than recording them removes that ceiling — the engineering tradeoff being that a generated world is only as good as the model that dreams it, which is precisely where the honest limits below come in.

DeepMind has already connected the two systems directly. In the Genie 3 announcement, the company said it generated worlds for "a recent version of our SIMA agent" and set the agent distinct goals, which it pursued by sending navigation actions to Genie 3. Because Genie 3 holds consistency over a longer horizon, DeepMind noted, "it is now possible to execute a longer sequence of actions, achieving more complex goals." The skills SIMA 2 acquires this way — navigation, tool use, and collaborative task execution — are, in DeepMind's framing, "some of the fundamental building blocks for the physical embodiment of intelligence needed for future AI assistants in the physical world."

What Does SIMA 2 Actually Score on Tasks It Has Never Seen?

DeepMind reports that SIMA 2 roughly doubled the original SIMA's task-completion rate, approaching human-level performance in trained environments and succeeding in games it had never encountered, including the Viking survival title ASKA and MineDojo, a research implementation of Minecraft. Reporting by MIT Technology Review put the earlier baseline in context: the original SIMA cleared complex tasks roughly 31% of the time against a human rate near 71%, and SIMA 2 roughly doubled that figure. When the agent was challenged inside Genie 3 worlds it had never seen, DeepMind says it could orient itself, follow instructions, and take meaningful actions toward goals despite the unfamiliar environment.

The Limits DeepMind Names Out Loud

The case for the loop is strongest when its gaps are stated rather than hidden, and DeepMind states them. Genie 3's published limitations include a constrained action space — the agent can directly perform only a limited range of actions — plus difficulty modeling interactions among multiple independent agents, an inability to reproduce real-world locations with perfect geographic accuracy, unreliable text rendering, and an interaction window of "a few minutes" rather than hours. The publicly released Project Genie prototype is tighter still: generations are capped at 60 seconds, run at 720p, and, per Google's own blog, may not always follow prompts or real-world physics, with characters that can be "less controllable."

SIMA 2 carries matching caveats. DeepMind says the agent "still faces challenges with very long-horizon, complex tasks that require extensive, multi-step reasoning," has "a relatively short memory" because it relies on a limited context window for low-latency interaction, and continues to struggle with precise low-level keyboard-and-mouse actions and robust visual understanding of complex 3D scenes. Both systems remain limited research previews open only to a small cohort of academics, creators, and game developers.

From Research Preview to Public Prototype, and Toward Hardware

The loop is no longer confined to the lab. Project Genie moved from research preview to a public prototype on January 29, 2026, initially for Google AI Ultra subscribers in the United States, and expanded to AI Ultra subscribers worldwide with a Street View feature in May 2026, opening the underlying world model to a far larger group of testers. DeepMind has said Genie 3 can supply "a vast space to train agents like robots and autonomous systems" and to expose their weaknesses, and the company has used world models to evaluate robots in new scenarios.

What it has not done is close the gap between a few minutes of generated practice and a robot that can reliably work in a real kitchen. DeepMind's own framing is that bridging simulation and "the messy reality of the physical world" is the open question — and that the Infinite Training Loop is its bet on how to get there.


Frequently Asked Questions

What is the difference between Genie 3 and SIMA?

Genie 3 is a world model that generates interactive 3D environments from a text prompt, running at 24 frames per second and 720p for a few minutes at a time. SIMA is a separate generalist agent that is placed inside those environments to follow instructions and learn tasks. In DeepMind's setup, Genie 3 is the "Teacher" that builds worlds and SIMA is the "Student" that practices in them.

How does SIMA 2 learn without human data?

After an initial phase of learning from human demonstrations, SIMA 2 can switch to self-directed play in which a Gemini model proposes tasks and estimates rewards for the agent's attempts. Those self-generated experiences are stored and used to train the next, more capable version, so the agent improves on tasks it previously failed without new human examples.

Can Genie 3 and SIMA train real robots yet?

Not directly. The skills SIMA 2 learns, such as navigation and tool use, are described by DeepMind as building blocks for physical embodiment, and the company already uses world models to test robots in new scenarios. But Genie 3 currently sustains only a few minutes of consistent interaction, and SIMA 2 struggles with long-horizon tasks, so a reliable transfer to physical hardware remains unproven.

Where can I try Genie 3?

Genie 3 powers Project Genie, a prototype web app that became available to Google AI Ultra subscribers in the United States on January 29, 2026, and expanded to AI Ultra subscribers worldwide in May 2026. SIMA 2 itself is a limited research preview open only to a small group of academics and game developers.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Join the Discussion