Google DeepMind Presents SIMA, Generalist AI Agent for 3D Virtual Environments

Google DeepMind has introduced SIMA, an AI agent designed to navigate various 3D virtual environments based on natural-language instructions. The aim is to develop a versatile AI capable of assisting users with tasks across different gaming platforms.

SIMA, The "Scalable Instructable Multiworld Agent"

The tech giant describes SIMA as a "Scalable Instructable Multiworld Agent" capable of interpreting and executing commands within different video game settings.

This marks a significant shift towards developing a generalized AI agent capable of understanding and responding to instructions across diverse gaming worlds.

The concept of SIMA emerged from the recognition that video games, due to their dynamic, interactive nature, are ideal environments for AI system development.

Google DeepMind, known for its expertise in AI and gaming, has previously explored AI's capabilities in games, from Atari classics to the complex strategies of StarCraft II.

The research on SIMA represents a departure from focusing solely on individual games towards creating an AI agent adaptable to multiple gaming environments.

By partnering with game developers, Google DeepMind trained SIMA on various games, including No Man's Sky and Teardown, exposing it to different interactive scenarios.

The training of SIMA involved recording human players' actions and instructions across various games to capture the relationship between language and gameplay behavior.

The AI agent was then developed to interpret natural-language instructions and execute corresponding actions within the game environment.

Unlike traditional AI approaches that require access to a game's source code or specialized APIs, SIMA operates with minimal input-screen images and natural-language instructions. This simplicity allows SIMA to potentially interact with any virtual environment, using familiar keyboard and mouse outputs for control.

SIMA's evaluation focused on its ability to perform basic tasks across different games, such as navigation, object interaction, and menu use. The current version demonstrates proficiency in completing simple tasks within a short timeframe, laying the groundwork for more complex assignments in the future.

"New Wave of Generalist, Language-Driven AI Agents"

Google DeepMind envisions SIMA evolving to tackle tasks requiring strategic planning and multiple sub-tasks, aiming for a level of AI assistance beyond basic actions.

The ultimate goal is to develop AI agents capable of understanding and executing higher-level language instructions to accomplish more sophisticated goals.

Initial evaluations of SIMA's performance highlight its ability to generalize across different gaming environments, outperforming specialized agents trained on individual games. However, further research is needed to enhance SIMA's performance to human levels across both familiar and unfamiliar games.

"SIMA's results show the potential to develop a new wave of generalist, language-driven AI agents. This is early-stage research and we look forward to further building on SIMA across more training environments and incorporating more capable models," Google DeepMind said in a blog post.

"As we expose SIMA to more training worlds, the more generalizable and versatile we expect it to become. And with more advanced models, we hope to improve SIMA's understanding and ability to act on higher-level language instructions to achieve more complex goals."