Grand Theft Auto and AI Transform Dog Images Into 3D Models

The University of Surrey researchers have joined forces with the popular video game Grand Theft Auto to explore the realms of artificial intelligence (AI). Their unconventional venture delves into transforming dog images into intricate 3D models.

"Our model was trained on CGI dogs, but we were able to use it to make 3D skeletal models from photographs of real animals. That could let conservationists spot injured wildlife, or help artists create more realistic animals in the metaverse," said Moira Shooter, Postgraduate Research Student.

Creating 3D Models of Dogs in Grand Theft Auto V

The researchers devised an AI system trained to predict the 3D pose of a dog based on a 2D image, utilizing images created within the virtual world of Grand Theft Auto V.

Despite being trained on computer-generated imagery (CGI) dogs, the AI model was reported to have generated 3D skeletal models from real-life dog photographs, offering potential applications such as wildlife conservation and creating lifelike animals in virtual environments.

Traditionally, teaching AI to extract 3D information from 2D images involves providing it with "ground truth" data about the objects' positions in 3D space, often obtained through motion capture technology. However, capturing such data from real dogs can be challenging, leading the researchers to explore virtual alternatives.

By modifying the code of Grand Theft Auto, researchers replaced the main character with various breeds of dogs, generating a diverse dataset consisting of videos depicting dogs in different activities and environmental conditions.

DigiDogs' 27,900 Frames

This dataset, named DigiDogs, comprises 27,900 frames and is a valuable resource for training AI models. The team plans to refine the dataset further using Meta's DINOv2 model to ensure its ability to predict 3D poses accurately, even when presented with real dog images.

The researchers emphasize the potential applications of extracting information from 3D poses, highlighting possibilities in fields ranging from ecology to animation. The ability to extract such information from single-view RGB images presents a versatile solution with wide-ranging possibilities.

In their approach, the researchers focus on addressing depth ambiguities inherent in previous methodologies by leveraging synthetic training data. By creating a synthetic 3D pose dataset, DigiDogs, through modifications to Grand Theft Auto, they aim to bridge the gap between artificial and real-world data.

Furthermore, the researchers harness the generalization capabilities of the DINOv2 foundation model, fine-tuning it for 3D pose estimation. Their comprehensive analyses demonstrate the practicality of estimating realistic 3D poses of dogs from real-world images, showcasing the potential of combining advanced AI models with synthetic training data.

Through qualitative and quantitative assessments, the researchers demonstrate the effectiveness of their approach in generating comprehensive 3D poses compared to traditional methods.

By surpassing existing techniques in 2D and 3D metrics, their study underscores the potential of utilizing synthetic training data and advanced AI models to improve the accuracy of 3D pose estimation in dogs.

The study's findings can be found here.