Google Gemini 1.5 Pro Can Now Listen to Audio Files Thanks to Latest Update

Google has unveiled a groundbreaking update to its Gemini 1.5 Pro model, equipping it with the ability to listen to uploaded audio files.

This latest feature allows the model to extract valuable insights from various sources, such as earnings calls or video audio, eliminating the need for written transcripts.

Public Availability via Vertex AI

Google Gemini 1.5 Pro Can Now Listen to Audio Files Thanks to Latest Update — For the first time, Google will make Gemini 1.5 Pro accessible to the public for the upcoming Google Next event. But wait, this AI can now hear what you say in calls or videos. Pawel Czerwinski from Unsplash

According to The Verge, the Gemini 1.5 Pro is now accessible to the public for the first time through Google's AI application-building platform, Vertex AI, as announced during the Google Next event.

Originally introduced in February, this version of Gemini Pro surpasses its predecessor, Gemini Ultra, in performance, positioning itself as the middle-weight model within the Gemini family.

Enhanced Performance and Functionality

Gemini 1.5 Pro boasts enhanced capabilities, including the ability to comprehend complex instructions and the elimination of the need for model fine-tuning. The search engine titan touts its improved performance, making it a formidable choice for AI applications.

Access Restrictions and Usage

Currently, Gemini 1.5 Pro is exclusively available to users with access to Vertex AI and AI Studio. While most users interact with Gemini language models through the Gemini chatbot, Gemini Ultra, the predecessor, powers the Gemini Advanced chatbot.

Despite its power, Gemini Ultra lags behind Gemini 1.5 Pro in speed and responsiveness.

Updates to Imagen 2 Model

In addition to Gemini, Google has introduced updates to its Imagen 2 model, enhancing its text-to-image generation capabilities. The new features, including inpainting and outpainting, enable users to add or remove elements from images seamlessly.

Furthermore, Google has made its SynthID digital watermarking feature accessible on all images generated through Imagen models, ensuring authenticity and provenance.

Integration with Google Search

Google aims to enhance the reliability of AI responses by previewing a feature that grounds AI responses with Google Search, ensuring they provide up-to-date information. This integration addresses concerns about the accuracy and timeliness of responses generated by large language models.

Google Gemini's Past Controversies

While Google continues to innovate with its Gemini and Imagen models, criticisms and challenges persist. Gemini faced challenges for generating historically inaccurate photos, highlighting the ongoing need for ethical considerations and quality control in AI development.

Google's advancements in AI technology, particularly with the Gemini 1.5 Pro and Imagen 2 updates, signify significant progress in the field of artificial intelligence. These updates pave the way for enhanced AI applications with improved functionality and reliability, promising a future of transformative innovation.

In other news, Google has finally integrated Gemini into Android Studio for the developers. The AI model will now be used for coding assistance. This will help them come up with improvements in documentation, debugging, and other processes.