ChatGPT, known for its text-based capabilities, is undergoing a significant transformation as OpenAI introduces voice and image functionalities. This AI assistant, which has gained immense popularity in recent months, has enabled users to create essays, poems, and summaries from text prompts. Now, it’s becoming even more interactive, allowing users to engage in voice conversations with the chatbot, marking a notable expansion of its capabilities.
OpenAI’s latest announcement coincides with Amazon’s commitment to invest up to $4 billion in rival Anthropic. This move is just one part of a larger battle among tech giants in generative AI. Google is striving to catch up with its Bard chatbot, Meta is embracing an open-source ethos for a competitive edge, and Microsoft is forging a close alliance with OpenAI. These developments underscore the fierce competition in the world of AI technology.
OpenAI Expands ChatGPT with Voice and Image Capabilities
Today marks a significant advancement in the field of generative AI, as OpenAI seamlessly integrates the familiar realm of voice-based assistants with its formidable large language models (LLMs).
As an illustration, users can now verbally request ChatGPT to spontaneously craft a bedtime story, guiding the narrative with a few spoken cues. Alternatively, users may simply ask questions, and ChatGPT will respond in spoken language.
Furthermore, within the realm of ChatGPT, users gain the capability to seek information through images. For instance, they can upload a picture and ask ChatGPT to elucidate its contents or provide instructions for accomplishing a specific task.
The voice capability is driven by a cutting-edge text-to-speech model capable of producing lifelike voices from written text and a brief sample of spoken speech. OpenAI has collaborated with accomplished voice actors to craft a selection of five distinct voices. At the same time, their open-source Whisper speech recognition system handles the conversion of spoken words into text.
Spotify was also revealed as a key partner in the launch, introducing an intriguing feature for podcasters. This feature enables podcasters to sample their own voices and seamlessly translate their shows from English into Spanish, French, or German, all while preserving their distinctive original voices. It’s worth noting that OpenAI has taken a cautious approach to avoid potential criticism by not making this technology widely available. Instead, they collaborated with a select group of podcasters, including Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett, for the initial launch.
In a blog post, the company expressed, “The innovative voice technology, which can create authentic synthetic voices using only brief snippets of actual speech, unlocks a wide array of creative and accessibility-oriented possibilities.” However, the company also acknowledged that these capabilities introduce new challenges, including the risk of malicious actors impersonating public figures or engaging in fraudulent activities.
The introduction of these fresh functionalities is scheduled to commence in a mere two weeks, initially becoming available to subscribed Plus and Enterprise users. To activate the voice features, users should navigate to the “settings” menu within the app, proceed to “new features,” and opt-in for voice conversations. Subsequently, they must tap the headphone icon situated in the upper-right corner and make their voice selection.
Initially, voice functionality will be confined to the ChatGPT Android and iOS applications, offered as an opt-in beta feature. Meanwhile, image search will be seamlessly integrated across all platforms as a default feature.