Google is working to make Assistant less fragmented and more natural sounding. Here’s how
Google wants Assistant to be uniform across all its devices and platforms. Its WaveNet neural networks are also learning new languages to make Assistant sound more realistic in various languages.
Google Assistant is great. It’s one of the best digital assistants out there currently and can give Apple’s Siri a run for its money. The only frustrating issue with the Assistant is its lack of uniformity across devices. The Google Assistant has multiple skills, but they vary across Android smartphones, Android Wear devices, Google Home speakers, the Allo app and soon some Android TVs.
For instance, Google Assistant on Android and iOS smartphones could not control Chromecast devices through voice commands for the longest time, whereas the ability existed on the Google Home launched in 2016 from the get go. However, the company just rolled out an update to Assistant giving it the missing ability on Android and iOS phones.
Similarly, Google Home speakers cannot cast to Android televisions with built-in Chromecast support, whereas they can cast to an attached Chromecast device. Addressing a media briefing on the Google Assistant, Gummi Hafsteinsson, Product Management Director for Google Assistant said, “That’s just a bug and we need to fix it.”
He further added that, “In some cases, it just simply doesn't make any sense.There are things that are essentially just capabilities of the device. For example, if I ask my phone to turn on the flashlight, it will turn it on, but my Home device doesn’t have a flashlight, so it doesn’t make sense. There are these things that are device inherent differences. Those are probably going to stay like that.”
Looking at previous updates, one can see that Google is trying to bridge the gap when it comes to uniformity of Assistant across devices. For instance, the Google Assistant on Home devices was not capable of making calls at first, but the ability was added later. Like Hafsteinsson said, in some cases, It is just not possible. For instance, the Google Assistant app on iPhones cannot carry out the same range of commands it can on Android devices due to API restrictions.
While there are many such examples of how Assistant works differently on different platforms and devices, Google assures a fix is incoming. “It’s just a matter of us being able to catch up and close the gap. For example you should be able to send messages through your Home device, directly or through the phone, it should be there but we just haven’t been able to connect all the dots,” said Hafsteinsson. He added, “The goal is for the Assistant to be the same thing across all those different devices. You shouldn't have to understand the technical intricacies behind the scenes, people who use these products. We will fix it.”
A more learned and natural sounding Assistant
The search giant has made a big breakthrough by deploying its machine learning-based speech synthesis, WaveNet in the current version of the Google Assistant to make it sound more human. WaveNet was built using a convolutional neural network and was trained on a large dataset of speech samples. The trained network can synthesise a voice one sample at a time, with each generated sample taking into account the properties of the previous sample. This makes Assistant more natural sounding than before and Google has the tests to prove it.
Hafsteinsson tells us, “It’s ability to both create more natural sounding voices but also with less amount of data is tremendously powerful.” Since WaveNet is a fairly new breakthrough, it is only deployed in the US English and Japanese versions of the Google Assistant. “It’s just a matter of time (before we add new languages). We are working on adding more voices. It went from a research project to actually being a production system in an amazingly short amount of time.”
In its nascent stages of development, WaveNet was extremely computationally expensive and took a full second to generate 0.02 seconds of sound — so a two-second clip like “turn left at Ashoka Road” would take nearly two minutes to generate. Now, WaveNet generates sound at 20x real time — generating the same two-second clip in one tenth of a second.
As WaveNet makes it to more language variants of the Google Assistant, the company is also looking to expand the base of regional Indian languages supported by the AI helper. As of now, the Google Assistant can understand only Hindi, but more languages will be added soon as per Hafsteinsson. “Using both machine translation and other machine learning technologies we are trying to find a way to truly scale the kind of languages we get and also understand that we can hopefully get to more language variants. It’s getting to hundreds of languages and with the natural language interface it is tremendously difficult project. You want to make sure that you do it in a way that’s thoughtful and matches expectations of people who speak the particular language in a locale,” he said.
We are looking forward to seeing how Google expands Assistant's capabilities in India and other parts of the world. While Google is deploying WaveNet in Assistant, Apple too is trying to make Siri sound less robotic and more humanistic through its machine learning-powered speech model. Amazon's Alexa has more third party integrations than any other digital assistant today and Microsoft is also rapidly expanding Cortana's skills. The AI assistant fight will be an interesting one to watch.