This Startup’s Artificial Voice Sounds Almost Indistinguishable From A Human’s

This Startup’s Artificial Voice Sounds Almost Indistinguishable From A Human’s

Here’s a demo of an artificial voice from Google using the wavenet method, released in October: The method represents 50% improvement in artificial voice generation, according to Google’s own research, after years of tiny, incremental improvements to popular use-cases like Siri and Alexa. “The new generation of speech technologies are going to emerge on the back of this.” Who else is using wavenet to improve their own voice technology? Here’s the most recent wavenet demo from researchers at Baidu, released in February 2017: Neither Apple nor Amazon have released any demos of their work on wavenet, but both companies are almost certainly working on the new technique. Apple said in a blog post in August that it had used deep learning to make a significant upgrade on Siri’s voice to make it sound less robotic in iOS 11. You can compare the differences with Siri from iOS 9 here, and you’ll notice that the Siri of today sounds much more natural already. When it first released wavenet last year, it took about two minutes to generate a two-second audio clip. Wavenet is still relatively new, and according to Cahill, senior voice engineers at some of Google’s competitors initially believed Google’s first public demo of the method was a PR stunt. So how did Voysis with its team of 10 engineers come up with a better wavenet demo than the likes of Facebook and Baidu? “That’s what wavenet does for speech synthesis.” Over time, wavenet can also make it possible for software to manipulate existing voices into saying things that are close to natural, without having to spend hours in a booth recording thousands of units of speech. “Nobody thought it was crazy.” Call centers could outsource more of their human work if they could use artificial voices that sound more natural.

Facebook knows literally everything about you
Twitter to start banning cryptocurrency ads, joining Google and Facebook
Google Reader Final Countdown: What to Do Before the Shutdown
  • Facebook
  • Twitter
  • Google+
  • Buffer
  • Pinterest
  • LinkedIn
Voysis founder Peter Cahill

An Irish startup has claimed a breakthrough in text-to-speech synthesis that improves on public demonstrations by Google’s DeepMind and Facebook.

The result is an artificial voice that lacks many of the glitches in intonation heard from digital assistants like Siri or Amazon’s Alexa. It sounds eerily human, and shows that you no longer need a multi-billion dollar R&D budget or hundreds of engineers to produce an artificial voice that’s as good as Google’s.

Voysis has shared its audio sample exclusively with Forbes, an automated reading of Anna Sewell’s novel “Black Beauty,” and you can listen to it here:

Voysis founder Peter Cahill insists that the sample above has not been pre-recorded by a human, but produced by an algorithm that was trained on a popular dataset for building text-to-speech software. The demo is also significantly longer than comparable ones released by Google, Facebook and Baidu, which you can listen to further down this story.

The technology’s secret sauce is nothing new. In fact, it’s not even unique to Voysis. It’s a method called wavenet, discovered by researchers at Google’s DeepMind and published as a research paper in September 2016. The method uses a particular type of neural-network architecture to create sound, and is said to represent a significant leap forward in artificial-voice technology. It also raises difficult questions about how close to “human” we want our artificial voices to sound.

The development comes at a time when digital assistants are becoming more popular because they exist not only on smartphones but on smart speakers in the home, an environment where consumers feel more comfortable to speak out loud to devices. Recent stats from Apple suggest Siri has more than 41 million monthly active users; in other words, more than 1 in 10 Americans talk to Siri at least once per month.

But wavenet hasn’t received much public attention because most consumers haven’t been able to experience it yet, and if they have, it has shown up as an automatic update with little fanfare.

Google only this month started updating the artificial voice on its Google Assistant smartphone app and Google Home speakers in the U.S. and in Japan to use wavenet, a Google spokesperson confirmed to Forbes.

Here’s a demo of an artificial voice from Google using the wavenet method, released in October:

The method represents 50% improvement in artificial voice generation, according to Google’s own research, after years of tiny, incremental improvements to popular use-cases like Siri and Alexa. It means Google Home and other companies that use wavenet should very quickly start getting better at pronouncing people’s names and locations, and sound less glitchy overall.

“Previous developments were improved by 1% a year,” says Cahill.

Wavenet, he says, is the biggest breakthrough in artificial voice generation in more than two decades. “The new generation of speech technologies are going to emerge on the back of this.”

Who else is using wavenet to improve their own voice technology? Cahill says the answer is any company that wants to be able to interact with consumers through voice, which ticks off all the big names like Apple, Amazon, Facebook and Baidu.

Some have been open about tinkering with wavenet and released demos of their latest work.

Here’s the most recent wavenet demo from researchers at Baidu, released in February 2017:

Neither Apple nor Amazon have released any demos of their work on wavenet, but both companies are almost certainly working on the new technique.

Apple said in a blog post in August that it had used deep learning to make a significant upgrade on Siri’s voice to make it sound less robotic in iOS 11. You can compare the differences with Siri from iOS 9 here, and you’ll notice that the Siri of today sounds much more natural already.

Apple also dismissed wavenet in that same blog post, saying it wasn’t feasible to use it…

Pin It on Pinterest

Share This