ChatGPT Evolution: OpenAI Adds Voice and Image Features

ChatGPT, OpenAI's popular generative AI assistant, now has voice and image features. Users can hold voice conversations with the AI chatbot, which is often used for text-based interactions. This update is meant to make ChatGPT more user-friendly and adaptable.

Since its debut around nine months ago, ChatGPT has been very popular since it enables users to create essays, poetry, and summaries based on text prompts. OpenAI is starting a new era for ChatGPT by integrating speech and visual functionalities.

Here's What to Know About the New ChatGPT Capabilities

Similar to famous voice assistants like Alexa or Google Assistant, the voice chat function on ChatGPT enables users to communicate with the service by making voiced inquiries. A text-to-speech model that can produce audio that sounds like a human being using text and a brief voice sample has been presented by OpenAI and integrated into their Whisper model for speech-to-text conversion. For ChatGPT, users may choose from five distinct voices.

One noteworthy collaboration in the voice space is Spotify, which intends to translate podcasts into several languages while preserving the original podcaster's voice using OpenAI's speech synthesis algorithms. By restricting access to this technology, OpenAI is making efforts to prevent receiving negative attention. They decided to work only with podcasters like Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett for the first launch.

OpenAI acknowledges that this new ChatGPT feature might potentially be dangerous since malevolent actors could pose as well-known people or conduct fraud. Hence, OpenAI implements stringent safeguards and restricts access to certain use cases and alliances in order to allay these worries.

In addition to speech, ChatGPT now has Google Lens-like picture search capabilities. Users may take pictures of things and ask ChatGPT to answer their inquiries or offer information about them. Similar to Google's multimodal search, this feature speeds the interaction process and allows users to edit their inquiries as they go.

ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023

To protect privacy and accuracy, OpenAI has added additional security measures that limit ChatGPT's capacity to examine and draw conclusions about specific people.

Keeping Pace in the AI Race

With the addition of these additional functionalities, ChatGPT has advanced significantly and now offers prospects for licensing the technology to other businesses.

These improvements are part of OpenAI's continuous attempts to remain competitive in the quickly developing area of generative AI, where tech behemoths like Amazon, Google, Meta, and Microsoft are also making major investments.

ChatGPT Plus users, who are paying $20 per month, will be the only ones to have access to the new voice and picture features. OpenAI intends to extend distribution to additional areas where ChatGPT operates, even though it will initially only be available in English.

As shown by Amazon's recent $4 billion investment in OpenAI competitor Anthropic and other tech industry developments, OpenAI's decision to improve ChatGPT is in line with the broad and fierce competition between tech titans in the field of generative AI.