And So It Begins: Google DeepMind AI Learns How To Talk Like Humans

Google has reached a milestone in its DeepMind artificial intelligence (A.I.) project with the successful development of technology that can mimic the sound of human voice.

Dubbed as WaveNet, the breakthrough was described as a deep neural network that can generate raw audio wave forms to generate speech. It can reportedly beat existing Text-to-Speech systems.

According to researchers in the Britain-based WaveNet unit, the gap in human performance, which could be demonstrated in an actual A.I. — human conversation — is reduced by as much as 50 percent.

What is also interesting about the WaveNet technology is that it is capable of learning different voices and speech patterns to the point that it can even simulate mouth movements and artificial breaths in addition to emotions, language inflections and accents.

"A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity," the researchers wrote in a paper.

WaveNet is currently capable of using the English and Chinese languages. It can also produce music such as classical piano pieces and compose songs on its own.

The significance of the recent A.I. breakthrough for Google rests on the sheer amount of data required to achieve its current technological quality. To put this into context, one should just consider how most computer-generated Text-to-Speech technologies are based on the collection of huge amounts of human sound recordings.

Google is using A.I. to address the challenge, selecting an approach called modelling raw audio based on previous technologies called PixelRNN and PixelCNN or two-dimensional Pixelnets. The new system, described as one-dimensional WaveNet, requires at least 16,000 different bits of samples per second, which entail the use of immense computing power, WaveNet's creators said in a blog post. The system had to be trained to produce utterances and learn context, among others. In total, the WaveNet algorithm required 44 hours of sample sounds recorded by more than a hundred speakers.

At the moment, observers do not see any immediate commercial utility for WaveNet in contrast to a DeepMind algorithm that can reduce energy consumption, as previously reported by Tech Times.

However, as people are increasingly becoming dependent on technologies, there is a need for sophisticated and natural mechanisms that will ensure an effective and seamless interaction with humans. It is for this reason why WaveNet is being closely watched by tech companies, according to Bloomberg.

Photo: A Health Blog | Flickr