OpenAI’s GPT-4o Model Is Everything We Wanted Voice Assistants to Be

What to know

  • OpenAI’s Spring Update introduced GPT-4o, the company’s new flagship model.
  • OpenAI also demonstrated an upgraded Voice Mode which is the most emotive and lifelike assistant yet. 
  • GPT-4o and its API will be available to all users, while the Voice Mode is only rolling out to Plus subscribers for now.
  • ChatGPT will also soon release its desktop app for Mac; a Windows app will become available sometime this year.  

OpenAI has raised the bar yet again. Although the news at the Spring Update event didn’t involve any excursions into search engine territory, OpenAI won the hearts and minds of many with its new GPT-4o model. It’s fast, snappy, and with an upgraded Voice Mode, is frighteningly like the AI assistant from Spike Jonze’s 2013 movie Her

But more importantly, it’s a big step forward in terms of voice assistants on smartphones to which ChatGPT wants to lay claim and for which it is now ideally suited too. Here’s everything you should know about GPT, the voice mode upgrades on ChatGPT, and what they entail for the industry.

GPT-4o model makes ChatGPT snappier, more emotive than any AI chatbot or assistant

Say hello to GPT-4o

GPT-4o (‘o’ for omni) is the company’s new flagship model and also the first model that combines text, vision, and audio. It has GPT-4 level intelligence, but is a faster and more efficient. On the previous version of Voice Mode, which worked with a mix of three models with varying degrees of intelligence, much of the main GPT-4 level intelligence was lost. This is where GPT-4o is different. 

GPT-4o is the first model, trained end-to-end across the three text, vision, and audio modalities, to solely power Voice Mode. And it shows. In one of the demos, folks at OpenAI got ChatGPT on two phones to talk to each other and sing songs.

Two GPT-4os interacting and singing

ChatGPT’s responses are fast enough to arrive in real time. It can also do such things as observe tone, detect emotional state from voice and videos, give advice, help you code, translate live, while making it all seem like an intimate human conversation. 

During the event, ChatGPT dramatized bedtime stories, switched voices on a dime, and ended with a song. 

Live demo of GPT4-o voice variation

These are only a few among the many things that ChatGPT can do with GPT-4 omni-model, which is already breaking new ground. Being the first of its kind, future omni-models could completely change the way we chat with ChatGPT, and our relationship with it.  

GPT-4o is free for all!

GPT-4o is also not reserved for Plus members only. The fact that OpenAI will automatically upgrade free users with GPT-4o raises the bar for other chatbots across the board. This comes in particularly handy for users who want to replace Google Assistant (or Gemini these days) for which ChatGPT is perfectly suited. 

There are already ways and means for Android users to use ChatGPT as their digital assistant. Some manufacturers, like Nothing, also let you add ChatGPT to the quick setting tile for faster access to voice mode. But an official ChatGPT assistant could well be the replacement that users want.   

With an official ChatGPT desktop app coming out, ChatGPT could well be your one assistant across devices. The ChatGPT Windows app will arrive sometime later this year while an app for Mac will be rolled out soon in the coming weeks.  

GPT-4o is a smaller, more efficient model

GPT-4o is the fastest, most affordable model yet, dethroning GPT-4 Turbo on a number of fronts. With GPT-Turbo level intelligence, GPT-4o is slated to be twice as fast, though real-world testing is yet to confirm this. But there are various other upgrades as well. Compare the salient features below:

Image: OpenAI

OpenAI is also making GPT-4o available in the Chat Completions, Assistants, and Batch APIs, which developers will readily jump on as soon as it comes out, especially since the API tokens are at half the price of GPT-4 Turbo.  

The speed of ChatGPT’s new model is particularly helpful in live translations, as demo-ed by OpenAI, its voice feature working as a third-party mediator between people speaking different languages. 

Live demo of GPT-4o realtime translation

The new voice and video mode feels like AI from the movies

Thanks to the omni-model, ChatGPT has a much wider emotional quotient, in that it can detect emotional states just by listening to your voice or looking at your facial expressions. But apart from having eyes and ears, it’s the voice that adds that magical human element and gives the illusion of there being a ghost in the machine which seems to have captured the imaginations of people the world over.    

Sam Altman aptly called it ‘AI from the movies’. Going by the lighthearted, slightly flirtatious tone in the demo, it won’t be surprising to see people more willingly adopt ChatGPT as their assistant over the traditional default assistants which, frankly, appear antiquated now. But before you make friends with ChatGPT, or get intimate with your digital companion, perhaps heed the message of the movie that Altman is alluding to, and avoid the pitfalls of mistaking digital companionship with real ones.     

Will Apple partner with OpenAI, replace Siri? 

Only a few days prior to the event, Apple was reportedly nearing a deal with OpenAI to power new AI features on future iPhones. Now that GPT-4o is out with a better, more evocative Voice Mode, we may see ChatGPT’s capabilities port to a number of iOS features, including Siri. It may be a stretch to imagine Apple ditching Siri anytime soon, if ever. But it could well be upgraded with abilities not too different from what ChatGPT offers. More news on this is expected to break as we approach the WWDC in June. So stay tuned for that.  

When will GPT-4o roll out?

Being only recently announced, GPT-4o may take a while to roll out to users globally. As for the spritely Voice Mode, Plus subscribers will be the first to access to it. So if you want to check out the new feature, chat longer, and get speedy responses from AI assistant, a Plus membership is still worth it.

With a better, faster architecture and a wider range of emotive capabilities, ChatGPT is perfectly positioned to become your favorite voice assistant, perhaps even a friend. 

What’re your thoughts? Are you excited to try ChatGPT as your digital assistant? The possibilities for creative conversations are endless, but so is the potential for misuse. So even as you take the new features for a spin, remember to use it first and foremost as a tool. Until next time! Stay safe.  

Leave a Reply

Your email address will not be published. Required fields are marked *