Continual Resolving

Our goal in this line of research is to generalize the continual resolving algorithm, which was developed and evaluated on poker, to be able to play any imperfect information game, be it another card game, board game, or a game theoretic model of a real world situation.

We are looking for new members for our research team. We offer positions for:

- PhD students;
- Postdoctoral fellows;
- paid internships;
- part time jobs for students.

If you might be interested, contact me at viliam.lisy@fel.cvut.cz . The start date is as soon as possible.

Imperfect information games

In games like chess and go, both players know exactly the current state of the game. Therefore, they are called perfect information games. On the other hand, in many card and board games, such as in poker or Scotland yard, the players do not know the state of the game exactly, because they cannot see the cards held by the opponents or their exact position on the playing board. In games with imperfect information, the players must estimate the hidden information available to the opponents based on the actions they play in the game. At the same time, they must play so that they do not reveal too much of their own private information. This makes playing imperfect information games much harder and optimal strategies in distinct states of the games closely interconnected.

Real world problems, such as computer network security, patrolling, inspections, auctions, or negotiations almost never provide perfect information to the actors. Therefore, imperfect information games are substantially more important for practical applications.

Continual resolving

Continual resolving is a new algorithmic paradigm for playing games with imperfect information. Instead of creating a strategy for the whole game in advance, which was previously necessary, continual resolving allows computing the strategy for any particular situation in the game once it occurs. In order to do that, it maintains compact statistics of the previously computed strategies and it uses deep neural networks to estimate how can the game develop in the future.

This approach is theoretically well founded and can be shown to approach the optimal strategy. Furthermore, it is practically very successful, since it is the algorithm used in DeepStack, the first AI system that has beaten professional poker players in no-limit Texas hold'em.

Research challenges

Poker, for which continual resolving has been developed and tested has a very specific structure. The three main simplifying properties are that 1) all actions of the players are public, 2) the number of possible unknown states is constant and 3) the game is naturally divided into rounds (often called streets) witch help to clearly structure the game in time.

Most other imperfect information games, such as Kriegspiel (blind chess variant) or Phantom Go, do not have this simplifying properties. The actions are only partially observed. In Kriegspiel, we learn whether the opponent made an illegal move, but not which one. If we made a legal move, we learn only that it is next player's turn to move. As a result, the number of possible states that the game can be in grows exponentially. Since the machine learned evaluation functions (realized by neural networks in DeepStack) takes a distribution over all possible states of the game as the input, the exponential growth of the number of possible states makes representing this function even more complicated.

It is possible that finding a suitable representation for the value function will not be possible or that its training will not be feasible computationally. Therefore, we investigate also Monte Carlo tree search variants of the algorithm and particle-based value functions that could work with only a samples of possible game situations.

Next related research direction we follow is adapting continual resolving to allow for stronger exploitation of weak opponents. Since it is currently trying to approximate the Nash equilibrium strategy, it fails to exploit even large systematic mistakes of the opponent. We aim to investigate how to mitigate this drawback.

Application domains

We are mainly interested in developing general algorithms for training near-optimal agent form playing huge variety of games. However, to demonstrate our progress and the practical capabilities of the developed algorithms, we will test the algorithm on several specific games. One of them is chess with imperfect information (Kriegspiel and Dark chess), because it has none of the simplifying properties of poker.

Related work

We are building on large amount of literature in computational game theory, game playing, and machine learning. The following is a short sample we aim to heavily build on.

Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, Zach Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling: Player of Games. arXiv:2112.03178, 2021.

V Kovařík, M Schmid, N Burch, M Bowling, V Lisý: Rethinking formal models of partially observable multi-agent decision making. Artificial Intelligence, 103645. 2021

V Kovařík, D Milec, M Šustr, D Seitz, V Lisý: Fast Algorithms for Poker Require Modelling it as a Sequential Bayesian Game. arXiv preprint arXiv:2112.10890, 2021

Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning. NeurIPS 2017.

M. Moravčík, M. Schmid, N. Burch, V. Lisý, D. Morrill, N. Bard, T. Davis, K. Waugh, M. Johanson, M. Bowling: DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337). 2017.

V. Lisy, M. Lanctot, M. Bowling: Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games. International Conference on Autonomous Agents and Multiagent Systems. AAMAS 2015

P. Ciancarini, G.P. Favini: Monte Carlo tree search in Kriegspiel. Artificial Intelligence, 174(11), pp.670-684. 2010.

F. Fang, T. H. Nguyen, R. Pickles, W. Y. Lam, G. R. Clements, B. An, A. Singh, M. Tambe, and A. Lemieux: Deploying PAWS: Field Optimization of the Protection Assistant for Wildlife Security. In AAAI, pp. 3966-3973. 2016.