Workshop

Plenary speakers and Program

Title: Policy Gradient Methods in Repeated Games (joint work with Domenico Mergoni and Ed Plumb)

Policy gradient methods have emerged as powerful tools for optimising complex decision-making processes. These methods utilise gradient ascent to iteratively enhance policies based on observed rewards. While they frequently achieve local convergence, global convergence guarantees remain elusive outside of specific classes of games. We illustrate this phenomenon through examples, illuminating the challenges surrounding global optimisation. Following that, we explore the utilisation of policy gradient methods in repeated games and define the set of policies that are learnable using these methods. Lastly, we provide a preview of our follow-up presentation, wherein we will present a Folk theorem result for learning in repeated games.

Title: Solving irreducible stochastic mean-payoff games by relative Krasnoselskii--Mann iteration (joint work with Marianne Akian, Ulysse Naepels and Basile Terver)

We analyse an algorithm solving stochastic mean-payoff games, combining the ideas of relative value iteration and of Krasnoselskii--Mann damping.  We derive parameterized complexity bounds for several classes of games satisfying irreducibility conditions. Our main result shows that an $\epsilon$-approximation of the value of an irreducible concurrent stochastic game can be computed in a number of iterations in $O(|\log\epsilon|)$ where the constant in the $O(\cdot)$ is explicit, depending on the smallest non-zero transition probabilities. We also derive as corollaries complexity results for turn-based stochastic games and entropy games. These results are obtained by methods of variational analysis, establishing contraction properties of the relative Krasnoselskii--Mann iteration with respect to Hilbert's semi-norm.

Title: Voting rules and the design of cost shares (joint work with Patrick Schmitz)

In a complete contracting world, a decision to provide a public good can be implemented in a way that induces agents to reveal their private information and efficiency obtains. Since such mechanisms are seldom observed in practice, we follow an incomplete contracts perspective by allowing only simple voting rules where each agent can vote either “yes” or “no”. A weighted majority rule is used to determine whether the good is installed or not. For each weighted majority rule, there is an optimal cost-sharing scheme. We illustrate how the rule and the associated cost-sharing scheme can be chosen to steer the agents as close as possible to the efficient solution.

Title: Beyond discounted returns: Robust Markov decision processes with average and Blackwell optimality (joint work with Nicolas Vieille (HEC Paris) and Marek Petrik (University of New Hampshire))

Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential decision-making under parameter uncertainty. RMDPs have been extensively studied when the objective is to maximize the discounted return, but little is known for average optimality (optimizing the long-run average of the rewards obtained over time) and Blackwell optimality (remaining discount optimal for all discount factors sufficiently close to 1). In this paper, we prove several foundational results for RMDPs beyond the discounted return. We show that average optimal policies can be chosen stationary and deterministic for sa-rectangular RMDPs but, perhaps surprisingly, that history-dependent (Markovian) policies strictly outperform stationary policies for average optimality in s-rectangular RMDPs. We also study Blackwell optimality for sa-rectangular RMDPs, where we show that approximate Blackwell optimal policies always exist, although Blackwell optimal policies may not exist. We also provide a sufficient condition for their existence, which encompasses virtually any examples from the literature. We then discuss the connection between average and Blackwell optimality, and we describe several algorithms to compute the optimal average return. Interestingly, our approach leverages the connections between RMDPs and stochastic games.

Title: Characterizing the top trading cycles rule for housing markets with lexicographic preferences when externalities are limited (single authored)

We consider a housing market model with limited externalities where agents care both about their own consumption via demand preferences and about the agent who receives their endowment via supply preferences (we extend the associated lexicographic preference domains introduced in Klaus and Meo, 2023). If preferences are demand lexicographic, then our model extends the classical Shapley-Scarf housing market (Shapley and Scarf, 1974) with strict preferences model. Our main result is a characterization of the corresponding top trading cycles (TTC) rule by individual rationality, pair efficiency, and strategy-proofness (Theorem 1), which extends that of Ekici (2023) from classical Shapley-Scarf housing markets with strict preferences to our model. Two further characterizations are immediately obtained by strengthening pair efficiency to either Pareto efficiency or pairwise stability (Corollaries 1 and 2). Finally, we show that as soon as we extend the preference domain to include demand lexicographic as well as supply lexicographic preferences (e.g., when preferences are separable), no rule satisfying individual rationality, pair efficiency, and strategy-proofness exists (Theorem 2).

Title: A belief-based approach to signaling (joint work with Marie Laclau and Tristan Tomala)

In this paper, we provide a geometric characterization of the set of interim equilibrium payoff vectors in a general class of signaling games. To obtain a tractable characterization, we use the belief based approach found in the literature on repeated games with incomplete information, cheap talk and Bayesian persuasion. This approach avoids  to specify the prior, the strategies of the sender and receiver, and the  belief system. The key ingredient is to consider  Bayes-plausible belief distributions that are incentive-compatible for the sender. Geometrically, this  leads to  a constrained convexification of the graphs of the non-revealing interim payoff correspondences. Our characterization extends the analogous result for  sender-receiver cheap talk games. We illustrate the results with some classical signaling games. We derive the best equilibrium payoff for the sender when his preferences are type-independent. For zero-sum preferences, we obtain an explicit formula for the ex-ante equilibrium payoff and establish a simple condition for the uniqueness of interim equilibrium payoffs.

Title: A regret minimization approach to fixed point iterations

We present a link between regret bounds and fixed point problems with nonexpansive maps. This allows the definition of many new fixed point iterations based on regret minimizing algorithms with corresponding convergence guarantees. In particular, we transpose the celebrated AdaGrad algorithm to obtain a fixed point iteration with strong adaptive properties.

Title: Coarse information design (joint work with Wing Suen and Yimeng Zhang)

We study an information design problem with continuous state and discrete signal space. Under convex value functions, the optimal information structure is interval-partitional and exhibits a dual expectations property: each induced signal is the  conditional mean (taken under the prior density) of each interval; each interval cutoff is the conditional mean (taken under the value function curvature) of the interval formed by neighboring signals.  This property enables examination into which part of the state space is more finely partitioned and facilitates comparative statics analysis.  The analysis can be extended to general value functions and  adapted to study coarse mechanism design.

Title: Price & Choose (joint work with Federico Echenique)

We describe a two-stage mechanism that fully implements the set of efficient outcomes in two-agent environments with quasi-linear utilities. The mechanism asks one agent to set prices for each outcome, and the other agent to make a choice, paying the corresponding price: Price & Choose. We extend our implementation result in three main directions: an arbitrary number of players, non-quasi linear utilities, and robustness to max-min behavior. Finally, we discuss how to reduce the payoff inequality between players while still achieving efficiency.

Title: Value-Positivity for Matrix Games

Matrix games are the most basic model in game theory, and yet robustness with respect to small perturbations of the matrix entries are not fully understood. We introduce value-positivity and uniform value-positivity, two properties that refine the notion of optimality in the context of a perturbed matrix game. These concepts capture, respectively, how the value depends on the perturbation, and the existence of a fixed strategy that guarantees the value of the game for all sufficiently small perturbations. We provide polynomial-time algorithms to check whether a perturbed matrix game satisfies these properties, and provide applications to stochastic games, where value-positivity is related to the existence of blackwell optimal strategies.

Title: Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary ϵ-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as 1/ϵ. The proposed algorithm is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).

Title: Optimizing the coalition gain in Online Auctions with Greedy Structured Bandits

Motivated by online display advertising, this work considers repeated second-price auctions, where agents sample their value from an unknown distribution with cumulative distribution function F. In each auction t, a decision-maker bound by limited observations selects n_t agents from a coalition of N to compete for a prize with p other agents, aiming to maximize the cumulative reward of the coalition across all auctions. The problem is framed as an N-armed structured bandit, each number of player sent being an arm n, with expected reward r(n) fully characterized by F and p+n.
We present two algorithms, Local-Greedy (LG) and Greedy-Grid (GG), both achieving constant problem-dependent regret. This relies on three key ingredients:
1. an estimator of r(n) from feedback collected from any arm k,
2. concentration bounds of these estimates for k within an estimation neighborhood of n and
3. the unimodality property of r under standard assumptions on F.
Additionally, GG exhibits problem-independent guarantees on top of best problem-dependent guarantees. However, by avoiding to rely on confidence intervals, LG practically outperforms GG, as well as standard unimodal bandit algorithms such as OSUB or multi-armed bandit algorithms.

Title: Correlation of Rankings in Matching Markets (joint work with Rémi Castera and Patrick Loiseau)

We study the role of correlation in matching, where multiple decision-makers simultaneously face selection problems from the same pool of candidates. We propose a model in which decision-makers have varying information on candidates from different sociodemographic groups when evaluating and ranking them, thus leading to varying correlations among candidates' priority scores. Such differential correlation arises, for example, when the cost of information acquisition, decision-maker preferences, or the prevalence of selection criteria vary across sociodemographic groups. We show that a lower correlation for one of the groups worsens the outcome for all groups, thus leading to efficiency loss. Moreover, the proportion of students from a given group who remain unmatched is increasing in its own correlation level. This implies that it is advantageous to belong to the low-correlation group. Finally, we extend the extent tie-breaking literature to multiple priority classes and intermediate levels of correlation. Overall, our results point to a previously overlooked systemic source of group inequalities in school, university, and job admissions.

Title: Games played by exponential weights algorithms (joint work with Maurizio d’Andrea and Fabien Gensbittel)

Exponential weights updates are probably the building block which is the most  used in machine learning algorithms for decision-making. We consider a repeated interaction, where each player  uses an exponential weights algorithm characterized by an initial mixed action and an update  parameter. The mixed action  profile $p^t$ played at stage $t$ then follows an homogeneous  Markov chain. In a first part, we show that the  probability under $p^t$ to play a strict Nash equilibrium  is a stochastic process  almost surely converging to 0 or 1.  In a second part, we show that any possible limit of $(p^t)$ has to be a Nash Equilibrium with Equalizing Payoffs. In a third part, we show that in strong coordination games  where the payoff of a player is positive on the diagonal and 0 elsewhere, almost surely $(p^t)$ converges to one of the strict Nash equilibria. We conclude with open questions.

Title: Optimal Queueing Regimes (joint work with Eran Shmaya)

We consider an M/M/1 queueing model where customers can strategically decide whether to join the queue or balk and when to renege. We characterize the class of queueing regimes such that, for any parameters of the model, the socially efficient behavior is an equilibrium outcome.

Title: Opportunity Hunters: A Model of Competitive Sequential Inspections (joint work with Ran Eilat (Ben Gurion University) and Zvika Neeman (Tel Aviv University))

We introduce a new type of games, called "opportunity-hunting games,'' in which two players compete to discover an uncertain event ("opportunity'') that occurs at an unobserved and random point in time. Players can inspect whether the event has already occurred again and again, but each inspection is costly. Varying the parameters of the model spans the range from games where competition between the players to be the first to identify the opportunity is the dominant force, to games in which free-riding on the other player's effort is the dominant force. We characterize the game's unique symmetric Markov Perfect Equilibrium.

Title: Mediated communication with coarse messages (joint work with Mael Le Treust)

This paper is about sender-receiver problems where the two parties communicate via a coarse set of messages. We propose a concept of mediated communication equilibrium capturing the message limitations and study the value of mediation and the optimal communication mechanisms. We consider the mechanism design and Bayesian persuasion scenarios as particular cases.