OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday, giving users their first access to GPT-4o’s hyper-realistic audio responses. The alpha version will be available to a small group of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in the fall of 2024.
Initial Reactions and Controversies
When OpenAI first showcased GPT-4o’s voice in May, the feature shocked audiences with quick responses and an uncanny resemblance to an actual human’s voice – one in particular. Sky’s voice resembled Scarlett Johansson, the actress behind the artificial assistant in the movie “Her.” Soon after OpenAI’s demo, Johansson said she refused multiple inquiries from CEO Sam Altman to use her voice and, after seeing GPT-4o’s demo, hired legal counsel to defend her likeness. OpenAI denied using Johansson’s voice but later removed the voice shown in its demo. In June, OpenAI said it would delay the release of Advanced Voice Mode to improve its safety measures.
OpenAI Limited Alpha Release
One month later, the wait is over. OpenAI says the video and screen-sharing capabilities showcased during its Spring Update will not be part of this alpha and will launch” later.” For now, the GPT-4o demo that blew everyone away is still just a demo, but some premium users will now have access to ChatGPT’s voice feature shown there.
Differences from Previous Voice Mode
ChatGPT can now talk and listen. You may have already tried out the Voice Mode currently available in ChatGPT, but OpenAI says Advanced Voice Mode is different. ChatGPT’s old solution to audio used three separate models: one to convert your voice to text, GPT-4 to process your prompt, and then a third to convert ChatGPT’s text into voice. However, GPT-4o is multimodal, capable of processing these tasks without the help of auxiliary models, creating significantly lower latency conversations. OpenAI also claims GPT-4o can sense emotional intonations in your voice, including sadness, excitement, or singing.
OpenAI User Access and Monitoring
ChatGPT Plus users will see firsthand how hyper-realistic OpenAI’s Advanced Voice Mode is in this pilot. TOPCLAPS could not test the feature before publishing this article, but we will review it when we get access.
OpenAI says it’s gradually releasing ChatGPT’s new voice to monitor its usage closely. People in the alpha group will receive an alert in the ChatGPT app, followed by an email with instructions on how to use it.
OpenAI Safety and Testing
In the months since OpenAI’s demo, the company has tested GPT-4o’s voice capabilities with over 100 external red teamers who speak 45 languages. OpenAI says a report on these safety efforts will be available in early August.
Preset Voices and Restrictions
The company says Advanced Voice Mode will be limited to ChatGPT’s four preset voices – Juniper, Breeze, Cove, and Ember – made in collaboration with paid voice actors. The Sky voice shown in OpenAI’s May demo is no longer available in ChatGPT. OpenAI spokesperson Lindsay McCallum says, “ChatGPT cannot impersonate other people’s voices, both individuals and public figures, and will block outputs that differ from one of these preset voices.”
Avoiding Deepfake Controversies
OpenAI is trying to avoid deepfake controversies. In January, AI startup ElevenLabs’s voice cloning technology was used to impersonate President Biden, deceiving primary voters in New Hampshire.
OpenAI Copyright and Legal Measures
OpenAI also introduced new filters to block specific requests to generate music or other copyrighted audio. In the last year, AI companies have landed themselves in legal trouble for copyright infringement, and audio models like GPT-4o unleash a whole new category of companies that can file a complaint. Particularly record labels, who have a history of being litigious and have already sued AI song generators Suno and Udio.