When the artificial intelligence, AlphaGo, beat Go champion Lee Seedol at his own game, it created history. AlphaGo didn't just beat Lee at Go, it won four games out of five, an unprecedented victory that many hadn't expected. AlphaGo's neural networks took thousands of Go matches, played by human players, and added its own learning on top of that, to come out with moves that even a legend like Lee Seedol didn't expect. But is AlphaGo the breakthrough in machine learning that the world has been waiting for? We talked to David Silver, Research Scientist, Google Deep Mind, to learn the same.
Silver is the main programmer on the AlphaGo algorithm, which makes him perhaps the best person to answer questions about its victory and the future. AlphaGo's victory is an important step in the field of machine learning, opening avenues for more advancements in future. It is still the first step to achieving true artificial intelligence though, and Deep Mind's work continues in that direction. The ancient Chinese game of Go, has long been considered the game to beat for AI, thanks to the fact that it is enormously complex, as Silver explains below.
Firstly, could you explain the significance of AlphaGo's victory over Lee Seedol? Why does a victory in Go matter so much?
Go represents a major milestone in AI research. Go is enormously complex, and such an intuitive game that so-called “brute force” search techniques are not sufficient. Until recently, computers have only played Go as well as amateurs.
Go’s simple rules and profound complexity resemble the real-world in many ways. Therefore we think Go is an important stepping stone towards a more general AI.
How does AlphaGo's victory in Go differ from Deep Blue's victory earlier. How do Chess and Go differ when it comes to being played by an AI?
The number of possible moves in chess is much lower. The search space in Go is vast — more than a googol times larger than chess (a number greater than there are atoms in the universe!) So traditional “brute force” AI methods, which construct a search tree over all possible sequences of moves, don’t have a chance in Go. In addition, it’s much easier to evaluate who is ahead in chess – for example, by adding up piece values.
Human grandmasters handcrafted the chess knowledge used by Deep Blue to evaluate chess positions. For AlphaGo, we did not tell AlphaGo what strategy to use — instead it learnt for itself from hundreds of thousands of human expert games, and from millions more games of self-play.
Finally, Deep Blue used a search algorithm that was based more on brute force – it evaluated thousands of times more positions than AlphaGo. Instead, AlphaGo searches much more smartly and selectively.
I understand AlphaGo actually has two neural networks working inside it. Could you explain these and how they work?
One neural network, the “policy network,” selects the next move to play. The other neural network, the “value network,” predicts the winner of the game. Each neural network takes the board position and passes it through many computational layers with millions of tunable weights, to come up with its answer.
AlphaGo's gameplay in this tournament has been described as aggressive. How does it do that? Why is that significant? Is there no rule-based programming at play here?
AlphaGo always selects the move that maximises its chances of winning the game. This can sometimes results in aggressive moves, but can also sometimes result in seemingly quiet and defensive moves, especially once AlphaGo already has a lead.
How easy or difficult will it be to adapt AlphaGo to other activities?
The methods we’ve used for AlphaGo are general-purpose, so our hope is that one day they could be extended to help us address other challenges from making smartphone assistants more useful to helping scientists with some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis. That being said, it’s very early days. We are decades away from human level AGI.
Is it true that games like Go can only serve for testing such algorithms?
Like many researchers before us, we've been developing and testing our algorithms through games, but we have much bigger goals than games. In the case of AlphaGo, we gave the AI a goal — to win at Go — and then it learns for itself the best way to achieve that goal. That’s a much more general way of learning, and a way that’s similar to the way you and I, as humans, learn.
AI programmers have been known to not recognise their own algorithms after a while. That said, doesn't it make it difficult to tweak algorithms when needed? (Not suggesting lack of control over AI here, simply the ways to tweak algos)
When training neural networks from scratch, the most important factor is designing the goals and objectives that you set the algorithm, and the mechanism for learning those goals, so it can learn for itself to achieve the desired behaviour.
If yes, does that make it difficult to scale or adapt AlphaGo to other applications?
This is the beginning, and there is a lot more work to be done before we can begin to apply these techniques to real world problems.
Do you plan things like AlphaGo to be used for consumer applications, like in smartphone voice assistants?
Not necessarily AlphaGo, but the machine learning techniques will be a tool that helps us do what we already do, but better — whether that’s instant translation, a smarter assistant on our phone that can plan a trip for us just by knowing what we like or by helping doctors to diagnose a disease much earlier.