Abstract
Monte Carlo Tree Search (MCTS) is currently the most popular game playing algorithm for perfect-information extensive-form games. Its adaptation led, for example, to human expert level Go playing programs or substantial improvement of solvers for domain-independent automated planning. Inspired by this success, researchers started to adapt this technique also for imperfect-information games. Imperfect-information games provide several difficulties not present in perfect-information games, such as the necessity to use randomized strategies to ensure an optimal play. Even though the properties of the optimal strategies in these games are well-studied, MCTS literature on imperfect-information games does not build on this knowledge sufficiently. None of the pre-existing MCTS algorithms is known to converge to the optimal strategy in the game, even if they were given an infinite time for computation.
In this thesis, we study MCTS in two-player zero-sum extensive-form games with imperfect information and focus on game-theoretic properties of the produced strategies. We proceed in two steps. We first analyze in detail one of the simplest classes of games with imperfect information: games with simultaneous moves. Afterwards, we proceed to fully generic imperfect-information games. We survey the existing MCTS algorithms for these classes of games and classify them to few fundamentally distinct classes. Furthermore, we provide the following contributions.
First, we propose new MCTS algorithms that provably converge to Nash equilibrium of the game with increasing computation time. We introduce three such algorithms. One based on a minor modification of the standard MCTS template for simultaneous-move games and other two as an adaptation of the successful Monte Carlo Counterfactual Regret Minimization (MCCRF) to online search in both simultaneous-move and imperfect-information games.
Second, we focus on improving the performance of MCTS algorithms, mainly by proposing and evaluating novel selection functions for choosing the actions to sample in the later iterations based on the statistics collected from the earlier iterations. In generic imperfect-information games, we propose explicit modelling of player's beliefs about the probability of being in a specific game state during a match.
Third, we perform an extensive evaluation of the proposed and existing MCTS methods on five simultaneous-move games and four fully imperfect-information games with variable size and fundamentally different properties. We evaluate both the ability of the algorithms to quickly approximate Nash equilibrium strategy and their performance in head-to-head tournaments. We show that the algorithms based on MCCFR have very a fast convergence to an equilibrium, but classical MCTS with the novel selection functions has superior performance in tournaments in large games.
Finally, we present a case study of using MCTS for creating intelligent agents for a robotic visibility-based pursuit-evasion game. We design domain-specific variants of the previously introduced algorithms and evaluate their performance in a complex simulated environment. We show that the algorithms based on MCTS outperform the best previously known algorithm for this problem.
Players in imperfect information games know exactly the rules of the game, but in each of their decisions, they do not know some of the the past (or concurrent) actions of the other players or outcomes of chance events. Typical examples are games like poker, but also real world problems, such as auctions or pursuit of smart targets in physical space.
Small variant of these games can be graphically represented as game trees, with imperfect information denoted as sets of decision nodes a player cannot distinguish. A simple one-card poker game can be represented as follows:
The game starts in the root by a chance node that represents the possible deals of cards to Ann and Beth denoted above the edges. Ann knows what card she was dealt, but does not know the Beth's card; hence, she cannot tell apart for example the chance outcomes (1,2) and (1,3) as denoted by the dashed lines in the upper left part if the tree. When the players reach a leaf of the game, they receive the corresponding rewards.
Realistic games played by people as well as models of real world problems typically have billions of nodes. Online search methods, such as MCTS attempt to find a good strategy to play in the game in a very limited time. In my thesis, I focused on this type of algorithms from both practical and theoretic perspective. In the practical perspective, I was developing algorithms that would win against most other algorithms in mutual matches and from the theoretic perspective, I was deriving guarantees on performance of these algorithms against worst-case opponents.
The developed methods have been evaluated in large variety of games, starting with simple theoretic games with few states to evaluate worst case properties of the proposed methods, through games commonly played by people, to a high fidelity simulation of tactical military operations with robotic support.
Here is a short video demonstrating the most advanced application: