Sounds of AI: ElevenLabs Adds AI-Generated Sounds to OpenAI's Sora

Voice technology company ElevenLabs has introduced a fun AI twist into OpenAI's newly released model, Sora, that generates videos from text prompts.

ElevenLabs integrated AI-generated sounds to enhance Sora, offering users a novel audiovisual experience. This development could signify a remarkable advancement in content creation, bridging the gap between visual and auditory elements in Sora-generated videos.

We were blown away by the Sora announcement but felt it needed something...

What if you could describe a sound and generate it with AI? pic.twitter.com/HcUxQ7Wndg
— ElevenLabs (@elevenlabsio) February 18, 2024

The Sounds of AI: ElevenLabs

Founded by former Google machine-learning engineer Piotr Dabkowski and ex-Palantir deployment strategist Mati Staniszewski in 2022, ElevenLabs has been at the forefront of voice AI innovation.

The company's expertise lies in natural speech synthesis and voice cloning technology, enabling users to create realistic AI voices across multiple languages and accents.

Now, ElevenLabs is expanding its repertoire by incorporating AI-generated sounds into Sora's high-resolution video clips. This addition aims to enhance the viewing experience by providing background audio that complements the visual content.

"We used text prompts like "waves crashing," "metal clanging," "birds chirping," and "racing car engine" to generate audio that we overlaid onto some of our favorite clips from the OpenAI Sora announcement," Eleven Labs wrote in a recent blog post.

"We're thrilled by the excitement and support from the community and can't wait to get it into your hands," it added.

ElevenLabs AI Voice Technologies

In addition to its recent advancement with AI-generated sounds, ElevenLabs has made strides in the voice technology landscape. The company recently announced an $80 million Series B funding round, attracting investments from prominent venture capital firms.

Since its inception, ElevenLabs has been committed to advancing voice AI research and product deployment. The company claims its technology has been adopted across various sectors, including publishing, gaming, media, and conversational AI.

Among its recent product developments is the Dubbing Studio, a workflow that enables users to dub entire movies and generate transcripts, translations, and timecodes. Additionally, ElevenLabs has launched a Voice Library marketplace, allowing users to share and monetize AI versions of their voices.

With a focus on safety and responsible AI development, ElevenLabs has implemented measures to ensure the authenticity and integrity of AI-generated content.

That includes the development of an AI Speech Classifier to verify the origin of audio samples and the introduction of safeguards to enhance safety in the public domain.

OpenAI Unveils Sora

OpenAI introduced the Sora model last week, showcasing significant advancements in text-to-video technology. While this innovation promises to revolutionize content creation, OpenAI has taken cautious steps, limiting public access until thorough evaluations of its potential implications have been conducted.

Instead, access is currently restricted to a select group of academics and researchers tasked with scrutinizing the model's capabilities and assessing potential risks.

Sora can generate intricate scenes based on textual descriptions featuring multiple characters, dynamic movements, and detailed environments. Its understanding of language extends beyond the mere interpretation of text prompts, encompassing a deep comprehension of spatial relationships within the depicted scenarios.