How to Use Voice and Image Prompts in ChatGPT

What to Know

  • As of September 27, 2023, ChatGPT Plus and Enterprise users can now interact with the chatbot with image and voice prompts as well as hear its response in humanlike voices.
  • To enter images into prompts, tap on the camera or gallery icon to the left of the message field, and capture or choose an image. You can also draw on the image to specify where ChatGPT focuses. 
  • To begin using Voice Mode, opt-in for voice mode from ChatGPT Settings > New Features.
  • Start a Voice conversation by tapping on the headphone button in the top right corner and selecting a voice. 
  • ChatGPT lets you choose from five different human voices. 

Almost a year since its launch, OpenAI continues to add features to enhance not just what ChatGPT can do but also how you use it. A recent update now lets you give voice commands and images as prompts to ChatGPT, and read your answers aloud in human voices, essentially facilitating a back-and-forth conversation between you and the AI chatbot. 

Here’s everything you need to know about how to access and use these new ChatGPT modes and how they advance a closer integration of AI in our lives.

ChatGPT gets voice mode and vision

The ChatGPT app can already translate recorded voice prompts to text. But support for direct voice conversations now allows interaction without involving text at all from either side, making the platform that much more flexible. 

The Voice feature works as one would expect – you tap on the screen and start speaking. Words are then turned to text and sent to the LLM. The response is turned back to speech, and finally, is read in a voice of your choosing.

OpenAI has collaborated with professional actors to deliver five different voices which adds an authentic touch to the answers while stimulating conversations naturally. 

On the other hand is Image Prompt which, as the name suggests, lets you add images from your camera or gallery and ask questions about them. This is in the same vein as Google Lens albeit with more reliable responses thanks to the advanced GPT architecture. 

How to prompt ChatGPT with voice commands

Voice Mode opens a new mode of conversation, but it’s not available to everyone just yet. OpenAI is rolling them out exclusively to ChatGPT Plus and Enterprise users for now. It is also only available on ChatGPT’s mobile app for iOS and Android, not on the desktop version. You can opt into voice mode from Settings > New Features. 

To begin using voice mode, tap on the headphone icon at the top right corner of the home screen and select a voice from the five available options.

Once the conversation begins, start speaking into the microphone.

The voice prompt will be sent as soon as you stop speaking. 

You can also tap in the middle to send your prompt manually. 

Use the pause and stop buttons to further control the recordings. 

ChatGPT will now deliver its response in your chosen voice. To interrupt an answer, simply tap in the middle as it’s being spoken.

Once the response is complete, you can start speaking again and take the conversation forward.

End the chat by tapping on the X at the bottom.  

How to prompt ChatGPT with images

Considering that other AI chatbots already have this up and running, image prompting becomes an important feature to bring to the platform alongside voice mode. It too is exclusively available to ChatGPT Plus and Enterprise users. But, fortunately, it is rolling out to the desktop version as well. 

Tap on the camera icon in the bottom left corner to start.

Capture the image. 

And tap ‘Confirm’. 

The image will be uploaded in the message field. Type your text to go along with it and hit Send.

ChatGPT will scan through the image and text prompts and respond accordingly. It may even prompt you for more visual references.

Draw on the image to ask ChatGPT focus on an object

You can also draw on the image to focus ChatGPT’s attention. 

Besides the camera, you have the option to add images from the gallery or folders as well. Tap on the ‘+’ sign to reveal additional image prompt options.

Then choose another means of uploading images. 

Select a picture.

You can add multiple pictures to a prompt.

Continue your conversations with follow-up images and text queries. Or switch to voice and speak your questions to go along with the images.  

Far-reaching benefits of ChatGPT’s voice and image capabilities

The implementation of natural human voices – or a close reproduction of them – can allow a host of real-world possibilities and scenarios.

For instance, you can take pictures of your food and get ChatGPT to give you an estimate of your calorie intake, get it to read a bedtime story to you in one of your preferred voices, open auditory learning, or plan DAN with it. Though it won’t exactly let you start a relationship with it like in the movies (Spike Jones’ Her comes to mind), the feature in essence is uncannily close to it.  

Having an AI with a humanlike voice doesn’t just open doors to novel use cases but also allows OpenAI to collaborate with services like Spotify and others to develop new AI-based features for their own platforms. 

FAQ

Let’s consider a few commonly asked questions about the new voice and image features on ChatGPT.

How to enable Voice Mode and Image Prompts in ChatGPT?

To start using the voice and image modes in ChatGPT, tap on the three horizontal lines, and select Settings > New Features. Make sure that you have a ChatGPT Plus or Enterprise plan and are using GPT-4.

Why can’t I find New Features in ChatGPT Settings?

If you don’t see the ‘New Features’ option, your device is yet to receive the new update. Check for updates for the app on the App Store or the Play Store. Although the feature is live, OpenAI has said it will be rolled out to users over the next few weeks. 

The ability to interact with voice and give image prompts brings the pioneers of generative AI back in the battle of bots. Though both Bing AI and Bard have similar features, they haven’t been able to implement multimodality in any interconnected, comprehensive way. Bing AI is unable to read aloud its response and Bard is yet to receive a standalone app. With the giants lagging a little, ChatGPT will look to wrest momentum for itself and its users. 

We hope this guide proved useful in understanding how you can use the new voice and image modalities on ChatGPT. Until next time!