OpenAI Updates GPT-4o: Now a Model for Voice, Video, and Text Interaction

Anurag Paul

20th May'24
OpenAI Updates GPT-4o: Now a Model for Voice, Video, and Text Interaction | OpenGrowth

On Monday, OpenAI announced a new AI model and a desktop app for ChatGPT. It has a performance that is much faster and omnimodal support. The CTO, Mira Murati, unveiled the new model at OpenAI headquarters recently. The improved AI model, GPT-4o, is now available to free users. It brings them a faster and more accurate AI experience only available to paid customers.


What Lies Behind the Technology of GPT-4o?

LLMs are the basis of AI chatbots. These models are trained on large volumes of data to become self-learning. Contrary to the previous versions that needed many models for different tasks, GPT-4o is a single model trained end-to-end for text, vision, and audio modalities. Murati clarified that the previous models required three different components—transcription, intelligence, and text-to-speech-for the voice mode. Now, GPT-4o integrates all these processes natively.

This combination makes GPT-4o to be able to analyze and comprehend the inputs more comprehensively What’s more? It can understand the tone, the background noises, and the emotional context of the audio inputs at the same time. This was a big problem for the previous models.

Regarding the features and abilities, GPT-4o is the best in speed and efficiency, it responds to queries as fast as humans do, in around 232 to 320 milliseconds. This is a big step from the previous models which took several seconds to respond. Moreover, GPT-4o is multilingual and it has made great progress in dealing with non-English text. 


Key Features of GPT-4o

The o in GPT-4o is for omni and that is the theme: it can handle 50 languages well now, said Murati of the new model. Developers can begin using it from the API of OpenAI. She also said that GPT-4o is twice as fast and half the cost of GPT-4 Turbo.

In the demonstration, the OpenAI team demonstrated the model's usage in calming someone before a speech. Mark Chen, a researcher at OpenAI, demonstrated their model, which is perceptible and deals with the emotions of the user being intervened. In its audio mode, ChatGPT is heard greeting users, cheekily. OpenAI hopes to release a test of Voice Mode soon, with beta access for ChatGPT Plus subscribers. The model can respond to audio prompts in real time.

Chen showed just how this could be done by asking the model for bedtime stories, making the voice more dramatic or even robotic, and even singing. OpenAI also states that the GPT-4o could also be a translator. This is observed from the dialogue between Chen and Murati in the video in the use of different languages. 


Market Competition and Partnerships

Team members also stressed that GPT-4o could help solve math problems and could be used for coding. This could turn it into a strong rival of Microsoft's GitHub Copilot.

OpenAI CEO Sam Altman said, "This new voice (and video) mode is the best computer interface I've ever used. It feels like AI from the movies, and it's still surprising to me that it's real. Achieving human-level response times and expressiveness is a significant change."

Murati also mentioned that OpenAI will for the first time launch a desktop app for ChatGPT to showcase GPT-4o. Now, users are going to get a new way to interact with the technology. Developers are going to get the power to build custom chatbots out of OpenAI's GPT store for the first time, something until now restricted to users not paying for it.

The release of the GPT-4o is set to have an impact on the tech space. A model that is part of Apple's iPhone operating system is the claim of reports by Bloomberg. This signals a strategic collaboration that is set to change the goalposts in terms of the type of AI that smartphones can own. This collaboration would possibly enable Apple to beat its peers with an AI product that works much better than Siri.

OpenAI's growth and quest for collaboration shows its eagerness to set up its AI across platforms. However, it often is embroiled in legal battles with media houses over copyright infringement. These litigations point out the critical interplay of innovation and intellectual property rights. Several publishers, including the New York Times, look for payment arrangements. Also, get to know the best AI trends to watch in 2024 and get to know what’s shaping the future of AI in this approach.


OpenAI GPT-4o Launch and Impact

The new model will become available starting from the 14th of May for ChatGPT Plus and Team customers and later for Enterprise clients. Free users of ChatGPT will be able to access the model from Monday but with limits on use. "Over the next few weeks, we will roll these capabilities out to everyone," explained Murati.

ChatGPT Plus users can send five times the number of messages compared to free users. ChatGPT Team and Enterprise customers will receive much more generous usage limits.

With 100 million weekly active users, ChatGPT is the fastest-growing consumer app in history. According to OpenAI, over 92% of Fortune 500 companies use the platform. It is also going to become one of the top AI productivity tools for streamlined workflows.


How to Use GPT-4o

Details aren't fully clear yet, but OpenAI suggested that GPT-4o might have some kind of free tier making it available to a wide range of people. Of course, there will also be options with paid plans that can handle more stuff and limits on usage.

OpenAI is now rolling out GPT-4 in incremental stages. Now, the text and image capabilities can be tried out by users themselves through ChatGPT, with a free tier available for all.

In other words, the Plus plan offers message limits five times more compared to the Basic, for an enriched experience. They will soon be launching an alpha of Voice Mode, with GPT-4o on ChatGPT Plus, for an enhanced, more natural conversation experience.

Developers will have GPT-4o on the OpenAI API, both a text and vision model. This is twice as fast, half the price, and with five times the rate limits compared to GPT-4 Turbo.

The GPT-4o release is a very important step in making AI more available and usable. This multimodal integration gives a way more natural and intuitive way to interact with a machine. So, stay tuned to know how to use AI and ML for startups like GPT-4o.


Why Does It Matter?

GPT-4o comes at the time when the AI race gets even more interesting. The tech giants like Meta and Google are working on more powerful LLMs for different products. This model could be a great help to Microsoft. This has already spent billions on OpenAI, by integrating GPT-4o into its current services.

The launch is also before the Google I/O developer conference, where Google is supposed to announce the updates to its Gemini AI model. Just like GPT-4o, Gemini is expected to be multimodal. Besides, at the Apple Worldwide Developers Conference in June, the news on the integration of AI into iPhones or iOS updates is also anticipated.

With the help of a fractional executive and experts on-demand, you can get to know how to use GPT-4o to its best. Whether you require assistance in marketing, HR, legal, branding, or any other domain, our global experts are here to support you at every stage. 

Anurag has been writing content for over eight years, and he is dedicated to it and cannot see himself in any other industry. As a passionate writer, he is interested in business and entrepreneurship. An accomplished technologist and financial expert, he strives to empower others through entrepreneurship, leaving his comfort zone to explore entrepreneurship. Having worked in the Financial sector for more than five years as a full-timer, he also has a keen interest in Corporate Finance ...