Google's AI Tool Imagen Lets You Generate High-Res Videos from a Text-Prompt

AI-generated artwork has been on the rise lately. Tools such as DALL-E, MidJourney, and Stable Diffusion are already changing the landscape of art, as more people can generate digital artworks with mere text prompts.

But what happens if this text-to-image generation trend levels up to videos? What if you type the prompt: "A cow jumps over the moon", and get a motion clip of this text?

Perhaps, we can go even more epic with the "Flying through an intense battle between pirate ships in a stormy ocean."

Thanks to Google's video-generating AI tool, these prompts can now be transformed into a motion picture.

Very happy to release #ImagenVideo today! Amazing work with an amazing team!https://t.co/Cdv8hKCGGk

High fidelity text to video with diffusion models: "Flying through an intense battle between pirate ships in a stormy ocean." https://t.co/0uxNTIoiFY pic.twitter.com/M3lAQPJG1K
— Tim Salimans (@TimSalimans) October 5, 2022

Imagen Video

Google's Imagen Video, a text-to-video generative AI model that can create high-definition videos from text input, was only announced on Oct. 5.

The text-conditioned video diffusion model is capable of producing videos with a maximum resolution of 1280768 at a frame rate of 24 fps, as reported first by VentureBeat.

In its recently released paper, "Imagen Video: High definition generation with diffusion models," Google says that Imagen Video has a high degree of controllability and world knowledge and can produce videos with high fidelity.

The generative model can produce a variety of films and text animations in various aesthetic styles, interpret 3D, and render and animate text. The model is now in a research phase, but its introduction comes just five months after Imagen highlighted the quick development of synthesis-based models.

Imagen Video includes an interleaved spatial and temporal super-resolution diffusion model, a basic video diffusion model, and a text encoder (frozen T5-XXL). According to Google, this design was created using the knowledge gained from past research on diffusion-based image generation.

The study team also incorporated progressive distillation for quick, high-quality sampling into the video models with no direction from classifiers.

The text-conditional video production, spatial super-resolution, and temporal super-resolution functions of the video generation framework are carried out via a cascade of seven sub-video diffusion models.

The entire cascade produces high-definition 1280768 films at 24 frames per second for 128 frames or roughly 126 million pixels.

Among the model's many impressive creative skills are its ability to create videos inspired by the paintings of well-known artists like Vincent van Gogh, display spinning objects in 3D while maintaining their structure, and render text in a variety of animation styles.

Excited to announce Imagen Video, our new text-conditioned video diffusion model that generates 1280x768 24fps HD videos! #ImagenVideo https://t.co/JWj3L7MpBU
Work w/ @wchan212 @Chitwan_Saharia @jaywhang_ @RuiqiGao @agritsenko @dpkingma @poolio @mo_norouzi @fleet_dj @TimSalimans pic.twitter.com/eN81LqZW7I
— Jonathan Ho (@hojonathanho) October 5, 2022

Is Imagen Video Available for the Public?

Since generative models may be misused for generating harmful content, Google said that it has taken several actions to allay these concerns. The company confirmed through internal tests that it implemented input text prompt filtering and output video content filtering.

However, Google issued a warning that there are still several significant ethical and safety issues that need to be resolved.

Hence, the company has not yet publicly released the model since they will still have to work on these concerns and alleviate potential risks.

This article is owned by Tech Times

Written by Joaquin Victor Tacla

Join the Discussion

Google’s AI Tool Imagen Lets You Generate High-Res Videos from a Text-Prompt

The tool can produce videos with a maximum resolution of 1280768 at a frame rate of 24 fps.

Imagen Video

Is Imagen Video Available for the Public?

Pokémon TCG Pitch Black Preorders Now Live: Site Crashes as 37-Day Window Opens

iPhone 18 Pro Camera Goes Mechanical: Variable Aperture, 2nm Chip, Dark Cherry Due September

Facebook Down for 100,000-Plus Users: Instagram and Meta Ads Hit in Global Outage

AI Chip Stock Selloff Erased $1 Trillion: Oracle Earnings Today Offer Next Recovery Test

ServiceNow Data Breach: Gated Advisory Left Customers Unaware of Exploited Zero-Auth API