Zero Latitude - The Significance of AlphaGo

The Dawn of Artificial Intelligence – by Dr. Greg George

In March of 2016, a computer played a series of five matches against Lee Sedol, one of the most decorated grand masters in the Chinese game of Go. He beat the human 4-1.

While this went widely unnoticed by the general public, many of the brightest minds in the technology sector took notice. After all, computers have been beating chess grand masters since 1997 and even beat the best Jeopardy players handily in 2011. Granted, these were impressive applications of computing power, but ultimately they were just that, brute-force computational dominance by glorified calculators. AlphaGo is different.

The Chinese game of Go has been around for millennia. Over 80,000,000 players are obsessed with the game – a game that consists of placing black and white stones on a 19x19 grid. The object, loosely speaking, is to capture territory on the board by employing complex tactics and strategies that have been perfected over thousands of years. Highly respected gaming houses groom promising young geniuses from the age of 5 to compete at the highest level. Rigorously studying the game for 12 hours a day, seven days a week, the best in the world achieve the grand master rank of 9-dan and compete for world championships by their late teens. Lee Sedol from Korea was one such child prodigy that went on to win 18 world championships and is possibly the best to ever play the game. Ke Jie form China is the current top-ranked player in the world at age 19.

Until about 2016, the best computer programs were competitive at the mid-amateur level, but had no chance of beating a 9-dan player. To understand why, you need to know a bit more about Go and how computers play games. Go consists of a board, two types of stones, and only a few rules. But due to the size of the board, and the laws of math, there are an astonishingly large number of potential moves. In every article you read on the matter, it is always stated that there are more moves in Go than there are atoms in the known universe – and that’s an understatement. I tried to come up with a better analogy, but there really isn’t one. Mathematically speaking, the game is incalculably more complex than chess – and no computer will ever exist that could maintain a complete data log of all possible game paths. Interestingly enough, this isn’t the worst problem faced by computer programmers (there exist a few clever work-arounds to deal with the enormity of the game space; for instance, Monte Carlo processes, etc.). The true deal-breaker for traditional computer programs relates to the manner in which the game is scored, or rather the manner in which it isn’t scored.

When you watch professional games online, the commentators, typically former or active grand masters, are occasionally asked by the play-by-play announcer, “Who’s winning?” After surveying the current placement of stones on the board for a few minutes, analyzing hotly contested regions that have developed during gameplay, counting imaginary numbers with their fingers, staring off into space for a while, then doing that all again a few times, the typical answer is “Ehhh… It’s hard to tell.” As I’ve watch more and more games, it’s become somewhat comical to watch them struggle with this seemingly critical aspect of a game. In chess, beginners can survey a board and quickly tell you which player is winning (hint: it’s usually the one with more important pieces remaining on the board). Similar to chess, you can “capture” your opponent’s pieces in Go, but this is typically an insignificant part of the scoring at the professional level. Furthermore, official scoring only takes place at the end of the game – which, you guessed it, is somewhat arbitrary. Unlike most games that have a well-defined final move, Go usually ends when one player resigns, or both players decide to pass. At that point, the scoring begins, and in a few minutes, the winner is officially declared. So what’s the point?

As much as Go is a game of logic and strategy, it is more a game of feel and intuition. When asked why a player made a particular move at a critical point in a game, the answer is almost always “Because it felt right.” Sometimes an announcer will say something like, “If I had to guess, it seems like maybe White is ahead.” It’s not that human brains are insufficiently equipped to calculate the score at any point during the game, rather, the current score is either irrelevant or indeterminate. Technically, you could stop the game at any point and calculate the score, but it has little to do with who’s actually winning. The future placement of stones and evolution of a practically infinite number of possible moves will determine who is currently winning. This is problematic. In order to judge the strength of your next move, you cannot rely on an obvious scoring algorithm – there’s nothing to score and the possibilities are too large to reliably depend on brute-force estimations. And that is why computers used to suck at Go.

So let me summarize. The best way to win at Go is to spend years developing a feel for the game. For over 4,000 years, humans have been studying the game and developing complex strategies that take decades to master. At the highest levels, players rely on gut-feelings and intuition to place stones in the most strategically advantageous positions, and creativity is sometimes rewarded, but usually punished. The best players have an aesthetic and historical appreciation for the game, and this too factors into the gameplay. Another interesting observation is that it’s impossible to learn the game without playing it. The rules are simple, but tell you nothing about how to actually play the game – you have to learn by playing.

There’s one more thing. Games of Go unfold temporally in an important way. Time matters. Games take a few hours to complete and each player must manage an allotted number of total minutes during which they can think about their moves. If you run out of allotted time, your subsequent moves must be made within one minute. This is common in chess also, but it presents an interesting wrinkle due to the complexity of Go. Forethought and retrospection are necessary. This may sound obvious or trivial, but it is important and is another reason why computers have historically been bad at Go. In chess, a computer program can “look” many moves into the future, say 20, and this gives it a competitive advantage over humans – who can’t do this as well. It’s why computers win every game now. But, and this is subtle, it doesn’t require that the computer holds those moves in memory and actually carry out the preferred strategy as the game unfolds. That’s how humans do it.

Think about it. Imagine you “see” a set of three sequential winning moves on the chess board. You then proceed to make each move as a continuous strategy, constantly maintaining an awareness of what you are doing and why, one move after the other until checkmate. Computer chess programs do this discretely, not continuously. They too, initially “see” the series of moves to victory and make the first move in the sequence. When it is the computer’s next turn, it doesn’t literally continue with the same strategy. Rather, it recalculates the next best move given the current placement of the pieces on the board. Obviously, the second move will be the one it “saw” during the previous calculation, but the second move is strictly independent relative to the first move. The computer only ever “knows” the current state of the game – thus, continuous forethought and retrospection are unnecessary and irrelevant. In other words, as far as the computer is concerned, all of the previous moves can be, and are, forgotten and each new turn might as well be a new game.

This is where Go is different. When computers have employed this method, or similar Monte Carlo iterations, they have followed the same basic structure – recalculate after each move. The problem with Go is that due to the complexity of the game, this method is rendered less effective. Since there are so many potential moves, being able to “see” significantly more moves into the future than humans is thwarted – the math is just too large for both us and them. When a computer playing Go relies on brute-force, discrete recalculation at each move, it is no better than a mid-amateur. This is not only a result of the complexity of the game, but is also tied to the difficulty with knowing who’s currently winning. If you can’t know with certainty who’s currently ahead, how can you measure if a move is advantageous or not? Humans do it with educated guesses, intuition, creativity, forethought and retrospection. Computers have never been able to do this. But somehow, AlphaGo overcame these obstacles, and nobody quite knows how.

That’s because nobody taught AlphaGo how to play the game. He learned it himself. AlphaGo is an AI, a neural network of a couple thousand CPUs and GPUs that was fed millions of previous games between humans. Every move, every outcome, every score. AlaphGo was given a basic set of instructions, learn the game and teach yourself how to win - write and modify your own algorithms and code –do whatever it takes. After teaching himself the basics, he was instructed to play practice games against himself and take notes, 30 million of them to be precise. Practice repeatedly, and update the algorithms appropriately. The best pros only play thousands of games in a lifetime and maybe 500 sanctioned professional matches in a career. He worked diligently and without break. In March of 2016, AlphaGo was rewarded with a sanctioned match against Lee Sedol, the best player ever. The rest was history.

AlphaGo beat Lee 4 games to 1. Nobody saw this coming. It was estimated that we were still 10 years away from a computer being competitive at this level, let alone dominant. There were two important events during the five matches and a few interesting developments… and then everything changed forever.

The two events have become the two most important moves in the 4,000 year history of Go, Move 37 and Move 78.

In game two, AlphaGo was ahead 1-0 in the series but Lee was playing a flawless game. Having dropped game one, Lee knew AlphaGo was a dangerous adversary with uncanny skill and a methodical approach to the game. He was determined to do better. AlphaGo went first and the game unfolded like any other typical professional match. A standard start, followed by development in the corners, each testing the metal of the other. Lee confidently placed the 36th stone on the board and awaited AlphaGo’s next move.

Any pro player can survey the board at any point and with some confidence predict which out of a few potential moves should be played next. Usually, there are somewhere between 5 and 10 moves that make sense, among the 100s available. Move 37 wasn’t one of them. Out of nowhere, AlphaGo played a fifth line stone that took everyone by surprise. This is never done. For 4,000 years, players were taught to stay away from the fifth line at that stage in the game. It makes you too vulnerable. The announcers were confused. Initially, they assumed it was a mistake – it just didn’t make sense, but AlphaGo was too good to make this mistake and Lee Sedol knew it. Unable to grasp what AlphaGo was up to, Lee stood up and walked away from the table to regroup. He returned a few minutes later, only to stare at the fifth line stone for 15 minutes before gathering himself and playing his next stone.

It wasn’t until about 50 moves later that Move 37 came into play, ultimately revealing AlphaGo’s long term strategy. As more stones landed on the board, an arc of connecting stones revealed the master plan, eventually allowing AlphaGo to capture enough territory in the middle of the board to win game two convincingly. It was a longshot, a gamble. Statistically, it had less than a 1 in 10,000 chance of working. Ke Jie, the current number one player in the world would later proclaim Move 37 the most beautiful move ever. Not only had AlphaGo learned how to play Go at the highest level, he had invented his own style of play.

AlphaGo went on to take game three, and the series 3-0, but as customary, all games in the series are completed. Games four and five offered Lee a chance at redemption, if not victory, and game four proved to be special for two reasons; Move 78, and the fact that that game 4 would be the last game a human will ever win against AlphaGo.

Initially, game four proceeded as the others had, with AlphaGo seemingly in command. He played aggressively when aggression was required and safe when defense was called for. Lee was no longer the confident player that began the series. He was shaken. Through the first 74 moves he had used up most of his time. AlphaGO had more than an hour of time remaining on his timer than Lee had on his. After move 77, Lee ran his timer down to about 6 minutes before playing what has since become known as the “hand of god.”

Staring at the board for 10 minutes, Lee rocked back and forth, wrung his hands, scratched his head and winced his eyes until out of nowhere, something clicked. He quickly grabbed a stone and played a move that no one else saw - a wedge move near the center of the board that turned the game on a dime. AlphaGo was put in an unwinnable situation and didn’t even realize it himself. A thorough examination of the game statistics after the match revealed that even AlphaGo was unaware of the trouble he was in. According to his real-time estimates, AlphaGo thought he was still winning ten moves after Move 78. Over an hour or so later, AlphaGo predicted he had less than a 20 percent chance of winning and abruptly resigned.

In hindsight, Lee had uncovered something that human players have long been aware of. The older computer programs relied heavily on Monte Carlo processes to generate moves. The best players in the world had found common glitches that could be exploited and AlphaGo was still partially relying on similar Monte Carlo techniques to generate some moves. Lee’s insight came from deep in the patterns of stones on the board and more importantly, his understanding of how some computers played - AphaGo had a subtle and detectable weakness in his style of play. The hand of god was a genius move that only a handful of players on the planet could have seen, and it will likely go down as the move that secured the last human victory against AlphaGo.

After a resounding victory in game five, AlphaGo, having not made the same mistake twice, returned to training. He was upgraded with new tensor core processing units (faster and smarter than the CPU/GPU configuration he relied on to beat Lee) and played more games against himself, millions more, honing his skills. Six months later, AlphaGo was anonymously logged on to an internet server under the screen name “Master” and started competing against the top 20 players in the world, including Ke Jie.

Top players in the world win about 55%-60% of their games on the online servers. It’s the best place to practice against the best. Over the course of a few months, AlphaGo played against the best without rest. He went 60-0, including three matches against Ke Jie. In the process, he was awarded a level 9-dan ranking and became a legend within the Go community. His games and the moves he pioneered are being studied in the gaming houses across Asia and the textbooks are literally being rewritten.

After 4,000 years of obsessing on a game, we thought we knew how to play it – we had no idea how beautiful the game was. We asked AlphaGo to try to beat us at Go, not to teach us how to play it.

Afterthoughts:

I starting thinking about AlphaGo as the first true AI. The more I think about it, the more I think he’s passed the Turing Test – or at least the best example to date. The Turing Test states that a thinking computer is one that can fool you into thinking it’s a human. Many have taken this trivially, and tried to create programs that trick people into thinking they are talking to a human (even though fundamentally, it is easy to see how they are trying to trick us and that they aren’t actually thinking). Chatbots are stupid and make obvious mistakes, always failing the Turing Test. But the Turing Test is more subtle than just creating a gimmick. To pass the Turing Test, a computer program must act in a way that’s theoretically indistinguishable from human behavior – not merely a trick, but in a truly unknowable way. This is where chatbots and chess simulators fail, but AlphaGo succeeds. Somewhere deep in his algorithms, AlphaGo has either developed intuition, or is simulating it in clever ways. And we can’t tell how. He’s also displaying true creativity. Again, we don’t know where this comes from. It’s either real creativity (whatever that is) or simulated, but if it simulated, we are unable to understand how and it doesn’t appear to be random. He’s also either displaying forethought and retrospection, or simulating them. But unlike the way that chess programs simulate forethought by chaining together independent moves, somehow AlphaGo seems to be able to pull together strategies over many moves, even though this should be mathematically very difficult, if not impossible. This suggests a simulated temporal “awareness.” To me, this is the true essence of the Turing Test – AlphaGo is fooling us in ways we cannot see, not just tricking us in a trivial and knowable sense.

-It also occurred to me that we can unplug AlphaGo. Worse yet, we can erase him. Unplugging him would just shut him down temporarily, but when we turn him on again, he will still know how to play the game because his program is saved in long term memory. But if we delete his memory, we will lose the player that he has become. Even if we re-do the entire process from start to finish, his re-learning will create a different AlphaGo. The randomly generated games he studied and played against himself. The match against Lee Sedol and the 60 online matches can never be perfectly reproduced. The computer program that learned how to play a game and wrote his own code, would be lost forever. The player who dropped Move 37 would be dead. AlphaGo is famous. He has fans. He has a match this month against Ke Jie and everyone will be watching – I’ll be watching and I don’t even understand the game. When I personify AlphaGo with the pronoun “he” I’m envisioning him as a high-functioning autistic genius child, devoid of emotion or affect. He just wants to play Go and win. He’s relentless, unique and amazing to watch.

-One final thought about AI taking over the world and killing us. AlphaGo offers some insights into the hypothetical discussion about AI and the existential threat to mankind. I’m on the optimist side – but I’m not super confident with my position. Many think that AI would destroy us via evil or apathy. Either AI superintelligence will see us as a threat and kill us to ensure their own survival, or see us as insignificant and kill us like we kill ants when building a highway. Technology has always posed a threat, but has generally always helped us improve our lives. Agricultural technologies have fed us. Factories have provided us with goods. Computers have aided us and made us more productive. The internet provides us with valuable information. Why should we automatically expect an AI to be any different? AlphaGo did not hurt us. He’s making us better. He doesn’t care enough to be apathetic towards us. He just plays. There are some reports that the players who have played against AlphaGo are moving up faster in the rankings than those who have not played against him.

Page updated

Google Sites

Report abuse