What to know

  • OpenAI has an improved Voice Engine model that can clone voices with only a 15-second voice sample.
  • OpenAI’s AI-voice cloning technology has far-reaching implications and the company is delaying a wide release at this time.
  • In the meantime, OpenAI is implementing various safety features and guardrails to identify and track the use of the technology. 

OpenAI’s Voice Engine model, which powers ChatGPT’s Voice and Read Aloud features, gets a powerful new ability. With nothing more than some text input and a 15-second voice sample, it can now generate a natural-sounding clone of a human voice that is very similar to the original speaker. And the results are scarily good. 

Along with the update, OpenAI shared on its website the results of a few voice cloning tests. Each of these includes an original ‘Reference audio’, followed by the cloned ‘Generated audio’. Here are a few samples of what the Voice Engine model is capable of:

Reference Audio 1

Generated Audio 1

Reference Audio 2

Generated Audio 2

Reference Audio 3

Generated Audio 3

With possible applications in the education and healthcare sector, in translation, and as well as in reaching communities across the globe, Voice cloning appears to have many markets waiting to gobble it up. But the implications of such technology are not all rosy.

Already AI-cloned fraudulent calls are on the rise. Even though there’s been a general consensus among nations to safeguard users in the age of AI, the guardrails are not so easily put in place, especially when the technology is racing ahead.   

OpenAI, however, is working to implement its own set of safety measures, “including watermarking to trace the origin of any audio generated by Voice Engine”, prohibiting the impersonation of another individual, and requiring explicit and informed consent from the original speaker. 

“We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse” the company stated in its blog post