Key Takeaways:
Artificial intelligence start-up OpenAI hosted its Spring Update live event on Monday where it announced an updated version of its widely popular large language model (LLM) GPT-4, called the ChatGPT 4o, or GPT-4 ‘Omni’.
OpenAI Launches New Multi-Modal AI, GPT-4o, That Can Understand Commands in Text, Voice, and Image
GPT-4o, which will be available to over 100 million paid and free users of ChatGPT in the next few weeks, will bring speech and video capabilities to the AI-powered chatbot.
In an X post, OpenAI CEO Sam Altman wrote that the model is “natively multimodal”, meaning it could generate content or understand commands in voice, text, or images.
According to OpenAI, the changes brought to the LLM are aimed at “reducing the friction” between “humans and machines” and “bringing AI to everyone”.
The Microsoft-backed company’s chief technology officer Mira Murali, who was one of the presenters at the live-streamed event, demoed GPT-4o by holding real-time conversations with it.
ChatGPT Can Now interpret User’s Emotions and Hold Visual Conversations in Real Time
She asked the AI to tell her a bedtime story and it did without any delay. OpenAI researcher Mark Chen prompted GPT-4o to make jokes and sing songs in different voices. The team also showcased the model’s video capabilities.
When in video mode, ChatGPT can now hold real-time conversations with the user. During the demo, OpenAI engineers wrote mathematical equations on a piece of paper and placed them in front of an iPhone running the app with GPT-4o. The model was able to view the problems through the phone’s lens and solve them while ambling along with witty conversations.
ChatGPT is well capable of reading emotions by looking at the user’s face through the camera. At the event, engineers showed a smiling face and the chatbot asked if they wanted to share the reason for their good vibes.
OpenAI promises that the visual and speech capabilities of GPT-4o will boost the quality and speed of ChatGPT in over 50 languages “to bring this experience to as many people as possible”.
There is also a desktop version of the LLM available, which will released on the Mac today for paid subscribers of ChatGPT.
GPT-4 Omni’s Average Response Time of 320 Milliseconds is Similar to Humans
OpenAI claims that GPT-4 Omni can respond to audio inputs in as little as 232 milliseconds. The average reaction time of the model is 320 milliseconds, which is similar to the response time of humans in a conversation.
Although the updated features are available to free and paid users, the company said that the Pro GPT-4o users will be able to access up to five times the capacity of free subscribers.
Viewers were impressed by the AI’s ability to hold conversations with three presenters who were talking to it at the same time. The model successfully discerned all the speakers and talked back to each of them.
The presenters also showcased GPT-4o’s ability to translate between languages in real-time. They based this on an X user’s question to translate English words to Italian.
GPT-4o Users can Create Custom GPTs with Voice and Visual Capabilities to Serve Specific Purpose
The changes will also be visible on ChatGPT’s application programming interface (API), which is now said to be 2 times faster and 50% cheaper than GPT-4 Turbo.
OpenAI used the event to refer to the Custom GPT Store, which was released earlier this year. Billionaire Sam Altman’s startup envisions a future where micro-communities can form around customized versions of GPTs.
They gave examples of a professor creating a Custom GPT for their students, or a podcaster creating one for their listeners.
OpenAI timed the launch of GPT-4 Omni just ahead of the Google I/O, where the Silicon Valley tech behemoth is expected to announce a collection of AI products that are part of its Gemini lineup.
More News: What To Expect From OpenAI’s Monday Announcement?