Feifei Li’s World Labs Splits World Model Into Three Types: Marble Targets Simulation Linchpin

Fei-Fei Li’s startup says simulators, not video renderers, are the path to spatial intelligence

World lab
worldlab worldlabs.ai

World Labs, the spatial-intelligence company co-founded by Stanford computer scientist Fei-Fei Li, published an essay on June 3, 2026 that tries to impose order on one of the most overloaded terms in artificial intelligence. The piece, written by Li and the World Labs team, argues that the phrase "world model" now covers three fundamentally different kinds of system, and stakes out where the company's first product, Marble, fits among them.

The answer matters to anyone building with these tools. If you are a game studio, a visual-effects house, an architect, or a robotics team weighing which generative-3D system to adopt, the taxonomy is a buying guide in disguise: it tells you which jobs each class of "world model" can actually do, and which it only appears to do. World Labs' bet is that the hardest and most valuable category — the physically faithful simulator — is the one the rest of the industry is underbuilding.

The essay lands months after World Labs converted Li's research conviction into a heavily funded business, shipping Marble and closing a $1 billion round anchored by design-software maker Autodesk.

Three Functions: Renderers Make Pixels, Simulators Make State, Planners Make Actions

The taxonomy hangs on the agent loop drawn from decades of reinforcement-learning textbooks: an agent takes an action, the world's state changes, and the agent receives partial observations in return. World Labs argues that the systems now being called world models are each just one slice of that loop.

A renderer outputs observations — pixels meant for human eyes — and is judged on visual fidelity. The essay puts video-generation models and Google's interactive Genie 3 in this bucket, noting they carry "no explicit understanding of three-dimensional structure." A drone shot they produce may look flawless from above, but try to drive through the city below and it falls apart.

A simulator outputs state: a geometrically and physically faithful representation that programs, not just people, can compute on. Its contract is structural — geometry that holds under inspection and physics that respects Newton's laws. A planner outputs actions, answering what an agent should do next; the new wave of vision-language-action systems and "World Action Models" are attempts at planners.

World Labs calls the simulator "the linchpin," and the claim is the heart of the essay. The same underlying knowledge of geometry and physics can be projected into pixels for a renderer and into action predictions for a planner — so a model that masters simulation can serve both, while a model that only renders or only plans cannot. That is also where the data is scarcest: 3D assets with explicit geometry and physical annotations are, as the company puts it, "orders of magnitude scarcer" than the internet video that renderers train on.

How Marble Works: Gaussian Splats for Eyes, Collision Meshes for Physics

The technical reason World Labs can position Marble as more than a pretty-picture generator comes down to what it exports. Marble takes a multimodal prompt — text, a single image, multiple images, a short video, or a coarse 3D layout — and produces an explorable 3D environment in two distinct representations at once.

Its highest-fidelity visual output is 3D Gaussian splatting, or 3DGS, which models a scene as millions of semitransparent particles, each carrying a position, scale, color, and opacity. That is a sharp break from the polygon-mesh pipeline that has dominated 3D graphics for decades, where objects are assembled from tiny triangles. World Labs renders these splats in the browser through Spark, its open-source renderer built on the THREE.js library.

Alongside the splats, Marble outputs collider meshes — low-fidelity geometry a physics engine can operate on — plus higher-quality triangle meshes for interoperability with standard tools. That dual output is the engineering decision that, in the company's words, "dissolves the boundary between the renderer and the simulator": one model produces both what a scene looks like and a structure a program can run physics against. The launch version also added Chisel, an experimental mode that lets advanced users block out coarse 3D structure with boxes and planes and have Marble fill in style and detail, decoupling a world's layout from its look.

👉 Read more:

Fei-Fei Li's ESI-Bench Catches Frontier AI Failing 3D Space: Seeing and Acting Diverge

What This Enables: Robotics Environments Built in Hours, Not Weeks

The export formats are what carry Marble out of the creative tools and into robotics, where the payoff is concrete. NVIDIA has published a technical workflow in which a Marble scene, exported as Gaussian splats and a collider mesh, is converted and imported into NVIDIA Isaac Sim to build a photorealistic, simulation-ready training environment. By its account, the approach compresses setup that once took weeks into hours.

That speed addresses a structural bottleneck. Robots cannot train on internet-scale data the way language models can; demonstrations and 3D environments are expensive and scarce. Cheap, varied, physically usable worlds are exactly what a robot-learning pipeline starves for, and a simulator that can mass-produce them is more valuable to that pipeline than a renderer that produces only video.

Why World Labs Spent $1.23 Billion Chasing This

The company behind the argument was founded in early 2024 by Li with Justin Johnson, Christoph Lassner, and Ben Mildenhall, a group rooted in computer vision and 3D graphics. It emerged from stealth in 2024 with $230 million at roughly a $1 billion valuation, from backers including NVIDIA's NVentures, AMD Ventures, Adobe Ventures, and Databricks Ventures.

In late January 2026, Bloomberg reported the startup was in talks to raise up to $500 million at a valuation near $5 billion, while cautioning the terms were not final. Those talks closed weeks later: on February 18, 2026, World Labs said it had raised $1 billion from AMD, Autodesk, Emerson Collective, Fidelity Management & Research Company, NVIDIA, and Sea. Autodesk anchored the round with $200 million — its largest startup investment ever — and took a strategic advisor role. World Labs declined to confirm the round's valuation. Total funding now stands at about $1.23 billion.

The intellectual engine is Li's November 2025 argument that today's large language models are "wordsmiths in the dark," eloquent but ungrounded, and that spatial intelligence is "the frontier beyond language — the capability that links imagination, perception and action."

The Honest Limits World Labs Names Itself

The essay is unusually candid about what does not yet work. Robotics demos to date, it says, have been "confined to heavily constrained laboratory setups," and the gap between "a compelling demo reel and a robot that reliably works in a kitchen, a warehouse, or an operating room remains vast." Generative simulators add a failure mode of their own: AI-generated geometry "can look correct while containing self-intersections or wrong scale that produce nonsensical physics." Reconciling visual beauty with the precision a robot needs is, the company concedes, "the defining open problem in world model research today."

World Labs is also not alone. NVIDIA runs a parallel simulation stack around Omniverse and its Cosmos world-foundation models, Google DeepMind has shown the interactive renderer Genie 3, and a field of well-funded startups is racing at the planner problem. World Labs frames Marble as "the first chapter" toward a unified world model that can render, simulate, and plan from one system — a destination it admits is years off.

👉 Read more:

Odyssey's Agora-1 Puts Four Players Inside the Same AI-Generated World — Built on a 1997 Shooter


Frequently Asked Questions

What is a world model in AI?

A world model is a generative AI system that learns the structure of physical space and time rather than the statistics of text. World Labs argues the term now spans three functions: renderers that output pixels, simulators that output physically faithful 3D state, and planners that output an agent's next action.

What is World Labs' Marble?

Marble is World Labs' first commercial product, a generative model that turns text, images, video, or a coarse 3D layout into an explorable 3D world. It exports Gaussian splats for high-fidelity visuals and collision meshes that a physics engine can use, so a single model serves both viewing and simulation.

How much has World Labs raised?

World Labs has raised about $1.23 billion in total. It emerged from stealth in 2024 with $230 million and closed a $1 billion round on February 18, 2026, anchored by a $200 million commitment from Autodesk. The company declined to confirm the round's valuation, though earlier reports cited a target near $5 billion.

What is 3D Gaussian splatting?

3D Gaussian splatting represents a scene as millions of semitransparent particles, each with a position, scale, color, and opacity, instead of the triangles used in traditional polygon-mesh graphics. It can deliver higher visual fidelity, and Marble uses it as its highest-quality output format.

ⓒ 2026 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Join the Discussion