signal-2020-09-12-225228 (via https://www.flickr.com/photos/foam/50732856042/)
Posts tagged go
AlphaGo is made up of a number of relatively standard techniques: behavior cloning (supervised learning on human demonstration data), reinforcement learning (REINFORCE), value functions, and Monte Carlo Tree Search (MCTS). However, the way these components are combined is novel and not exactly standard. In particular, AlphaGo uses a SL (supervised learning) policy to initialize the learning of an RL (reinforcement learning) policy that gets perfected with self-play, which they then estimate a value function from, which then plugs into MCTS that (somewhat surprisingly) uses the (worse!, but more diverse) SL policy to sample rollouts. In addition, the policy/value nets are deep neural networks, so getting everything to work properly presents its own unique challenges (e.g. value function is trained in a tricky way to prevent overfitting). On all of these aspects, DeepMind has executed very well. That being said, AlphaGo does not by itself use any fundamental algorithmic breakthroughs in how we approach RL problems.
Over the last week, a number of forum threads have popped up to discuss this mystery debutante who has been thrashing the world’s best players. Given its unbeaten record and some very “non-human” moves, most onlookers were certain that Master and Magister were being played by an AI—they just weren’t certain if it was AlphaGo, or perhaps another AI out of China or Japan. It is somewhat unclear, but it seems that DeepMind didn’t warn the opponents that they were playing against AlphaGo. Perhaps they were told after their games had concluded, though. Ali Jabarin, a professional Go player, apparently bumped into Ke Jie after he’d been beaten by the AI: “He [was] a bit shocked… just repeating ‘it’s too strong.’” Gu Li, as quoted by Hassabis, was a lot more philosophical about his loss to the new version of AlphaGo: “Together, humans and AI will soon uncover the deeper mysteries of Go.” Gu Li is referring to the fact that AlphaGo plays Go quite differently from humans, placing stones that completely confound human players at first—but upon further analysis these strategies become a “divine move.” While there’s almost no chance that a human will ever beat AlphaGo again, human players can still learn a lot about the game itself by watching the AI play. If you want to watch the new AlphaGo in action, a German website has the first 41 games from the 51-game streak, including victories against many of the world’s best human players. At this point it isn’t clear how this new version of AlphaGo differs from the one we saw last year, though some Go observers suggest that this version is making more “non-human” moves than before, indicating that the deep neural network might’ve been trained in a different way.
Natürlich gibt es Gerüchte, dass es hinter Master(P) niemand anderes als das noch stärker gewordene AlphaGo stecken muss, dass vor einem Wettkampf im ersten Quartal 2017 mal eben noch zeigen wollte, wie hoch der Hammer mittlerweile hängt. Andere Kandidaten wären das koreanische DolBaram-Projekt, das von der Korean Amateur Baduk Association (KABA) und der koreanischen Regierung unterstützt wird, und ein chinesisches Projekt, das Gerüchten zufolge bereits längere Zeit auf AlphaGo-Niveau spielen können soll. DeepZen, das unlängst gegen Cho Chikun 9p angetreten war, scheint es zumindest nicht zu sein, denn das spielte parallel auch recht erfolgreich auf Tygem – aktuell mit einem Score von 159:18, zumeist gegen spielstarke 9d-Spieler mit oder ohne (P)-Zusatz. Aja Huang vom AlphaGo-Projekt kommentierte Spekulationen um die Identität von Mater(P) und AlphaGo auf jeden Fall nur mit einem vielsagenden “interesting”.