In one of its most significant updates to date, OpenAI has rolled out two transformative features to its ChatGPT, setting the stage for a new era of AI interaction.
Voice Integration: First and foremost, ChatGPT now boasts a voice. Users can select from five lifelike synthetic voices to engage with the chatbot as if they were conducting a phone call. This enhancement allows real-time responses to spoken questions, ushering in a more immersive AI experience.
This voice feature leverages two distinct models. Whisper, OpenAI’s existing speech-to-text model, transcribes spoken words into text, which is then fed into the chatbot. Simultaneously, a new text-to-speech model converts ChatGPT’s responses into spoken language.
The synthetic voices were created through training on the voices of actors personally hired by OpenAI, with an emphasis on ensuring they are pleasant and easy to listen to. OpenAI also hints at the possibility of allowing users to create their custom voices in the future.
Spotify is one of the early beneficiaries of this text-to-speech model. The streaming giant is utilizing the same synthetic voice technology to translate celebrity podcasts into multiple languages, retaining the podcasters’ original voices in a synthetic format.
Image Recognition: The second major update empowers ChatGPT with image recognition capabilities. Users can now upload images to the app and inquire about their content, marking a significant leap in AI’s visual understanding.
During a recent demo, a ChatGPT user uploaded a photo of a math problem, circled a puzzle on the screen, and asked for a solution. ChatGPT provided the correct steps. This functionality extends beyond solving math problems. Users have also utilized it to troubleshoot technical issues, such as error messages on computers.
This image recognition feature has been trialed by Be My Eyes, an app designed to assist people with visual impairments. Users can upload photos and ask volunteers or the chatbot to describe what they see.
However, OpenAI acknowledges the potential risks associated with these updates. Combining multiple models introduces complexity and requires stringent safeguards. Certain restrictions are in place; for instance, questions about private individuals in images are prohibited.
The addition of voice recognition has raised concerns regarding accessibility, as accents and dialects may impact user interactions. Additionally, synthetic voices carry social and cultural nuances that could influence users’ perceptions and expectations.
OpenAI asserts that it has addressed many of these challenges and is confident that the updates to ChatGPT are secure for public release. These developments mark OpenAI’s continued commitment to enhancing the utility and capabilities of its AI models, making them more accessible and user-friendly.
As the AI landscape evolves, the coming months will reveal how these updates reshape the way we interact with AI systems, driving us closer to a future where AI becomes an integral part of our daily lives.