EC'25 Workshop

Online Learning and Economics

Thursday, July 10th at Stanford University, Stanford, CA

Summary

The workshop focuses on the intersection of online learning and economics, exploring how learning algorithms are increasingly used for decision-making in strategic economic settings such as markets and platforms. The event brings together researchers from diverse fields to discuss current challenges, share recent advances, and foster collaboration. The scope of the workshop extends beyond online learning to encompass other relevant domains of machine learning, including, for example, learning theory and reinforcement learning.

Program

Morning Poster Sessions

The morning will feature two poster sessions, structured as follows:

Poster Session 1: 9:00 AM – 10:15 AM
Coffee Break: 10:15 AM – 10:45 AM
Poster Session 2: 10:45 AM – 12:00 PM

For details on which posters are assigned to each session, please refer to the Accepted Posters section.

Afternoon Invited Talks - Fisher A

In the afternoon, we are excited to host a series of invited talks that are scheduled as follows:

1:30 – 2:00 pm: Christian Kroer
2:00 – 2:30 pm: Haipeng Luo
2:30 – 3:00 pm: Paul Dütting
3:00 – 3:20 pm: Coffee Break
3:20 – 3:50 pm: Ellen Vitercik
3:50 – 4:20 pm: Nika Haghtalab

Keynote speakers

Paul Dütting (Google)

Nearly Tight Regret Bounds for Profit Maximization in Bilateral Trade

Abstract

Bilateral trade models the task of intermediating between two strategic agents, a seller and a buyer, willing to trade a good for which they hold private valuations. We study this problem from the perspective of a broker, in a regret minimization framework. At each time step, a new seller and buyer arrive, and the broker has to propose a mechanism that is incentive-compatible and individually rational, with the goal of maximizing profit.

We propose a learning algorithm that guarantees a nearly tight $\tilde{O}(\sqrt{T})$ regret in the stochastic setting when seller and buyer valuations are drawn i.i.d. from a fixed and possibly correlated unknown distribution. We further show that it is impossible to achieve sublinear regret in the non-stationary scenario where valuations are generated upfront by an adversary. Our ambitious benchmark for these results is the best incentive-compatible and individually rational mechanism. This separates us from previous works on efficiency maximization in bilateral trade, where the benchmark is a single number: the best fixed price in hindsight.

A particular challenge we face is that uniform convergence for all mechanisms’ profits is impossible. We overcome this difficulty via a careful chaining analysis that proves convergence for a provably near-optimal mechanism at (essentially) optimal rate. We further showcase the broader applicability of our techniques by providing nearly optimal results for the joint ads problem.

Nika Haghtalab (UC Berkeley)

Learning in Stackelberg Games, Revisited

Abstract

Stackelberg games model strategic interactions where a principal (or “leader”) commits to a strategy, anticipating that an agent (or “follower”) will observe it and best respond. These games capture a wide range of real-world scenarios—such as security, policy design, and markets—where commitment and anticipation are crucial. While computing the Stackelberg-optimal strategy typically requires full knowledge of both players’ utilities, the past decade has seen the emergence of learning in Stackelberg games as a framework for dealing with utility uncertainty. A key insight from this literature is that an uninformed principal can, through repeated interactions, learn to attain her Stackelberg value despite her lack of initial knowledge.

In this talk, I revisit this foundational insight and highlight two core assumptions that enable it: (i) the agent possesses unusually strong awareness of the principal’s actions, and (ii) the agent lacks the ability for long-term planning. We examine how these assumptions impact the practicality of this insight. From a technical perspective, I will discuss three takeaways. First, the agent need not fully observe the principal’s strategy; rather, it suffices for the agent to have calibrated predictions of it, which can be achieved through our contribution to the modern literature on calibrated forecasting. Second, the agent’s inability for long-term planning is consequential: in some games, no Nash equilibrium over algorithmic strategies allows either player to achieve the Stackelberg value. Third and finally, I show that some forms of long-term planning on the part of the agent—such as planning with discounted future utilities or via reinforcement learning—can still be accommodated in a broad class of games using more robust and conservative Stackelberg learning algorithms.

This talk is based on a synthesis of a long line of work on learning in Stackelberg games. Most relevant recent works highlighted in the talk are jointly authored by Nivasini Ananthakrishnan, Thodoris Lykouris, Sloan Nietert, Chara Podimata, Alex Wei, and Kunhe Yang.

Christian Kroer (Columbia University)

On the separation of uniform convergence rates for OFTRL in games

Abstract

Non-ergodic convergence of learning dynamics in games has been widely studied recently because of its importance in both theory and practice. In this talk, we focus on a broad class of popular learning dynamics based on Optimistic Follow-The-Regularized-Leader (OFTRL), including the Optimistic Multiplicative Weights Update (OMWU) algorithm and conceptual prox algorithm. For matrix games, we show a separation result between the last-iterate, random-iterate, and best-iterate convergence properties of OFTRL. We start by proving that OFTRL may not achieve any uniform (i.e. instance-independent) last-iterate convergence rate, in stark contrast with some OFTRL algorithms achieving a $O(1/T)$ uniform convergence to an equilibrium of the game for the average iterates, and despite many of these dynamics being known to converge asymptotically in the last iterate. We also prove several lower bounds for the uniform random-iterate convergence rate, measured by the average duality gap, showing in particular that OMWU cannot achieve a polynomial uniform random-iterate convergence. Finally, we prove that OMWU achieves a $O(T^{-1/6})$ uniform best-iterate convergence rate for $2 \times 2$ matrix games. Our results challenge the conventional wisdom that last-iterate, random-iterate and best-iterate convergence rates are essentially equivalent, as has been shown for other popular learning algorithms such as Optimistic Gradient Descent Ascent.

Haipeng Luo (University of Southern California)

On the Convergence of Self-Play in Games with Bandit Feedback

Abstract

Self-play via online learning algorithms has emerged as a powerful paradigm for training agents in complex multi-agent environments, underpinning some of the most striking successes in AI, such as AlphaGo and superhuman AI for poker. It enables agents to improve iteratively by interacting with copies of themselves, offering a natural path toward equilibria in games. Despite heavy study in recent years, however, the convergence of self-play dynamics is still not fully understood, especially in the case where agents only observe the reward of the played actions (i.e., bandit feedback), a prevalent situation in auctions, pricing, trading, and other applications in economics. In this talk, I will discuss two recent results on this front. In the first part, motivated by the well-known accelerated average-iterate convergence rate when learning with gradient feedback, I will address the question of whether acceleration is also possible under bandit feedback and provide an affirmative answer using a simple algorithm that enjoys instance-dependent convergence rates. In the second part, I will switch focus to the more challenging notion of last-iterate convergence and discuss a new uncoupled self-play dynamic with improved convergence rate, leveraging a simple black-box reduction from last-iterate convergence to average-iterate convergence.

Ellen Vitercik (Stanford University)

Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty

Abstract

On online marketplaces, customers can access hundreds of reviews for a single product. Buyers often use reviews from other customers that share their attributes—such as height for clothing or skin type for skincare products—to estimate their values, which they may not know a priori. Customers with few relevant reviews may hesitate to buy a product except at a low price, so for the seller, there is a tension between setting high prices and ensuring that there are enough reviews that buyers can confidently estimate their values. In this talk, we formulate this pricing problem through the lens of online learning and provide a no-regret learning algorithm. This is joint work with Wenshuo Guo, Nika Haghtalab, and Kirthevasan Kandasamy.

Call for posters

Important Dates (All times are 11:59 PM AoE)

Submission Deadline: June 6, 2025
Notification of Acceptance: June 9, 2025
Workshop Date: July 10, 2025

OLE 2025 aims to provide a venue for researchers to explore and discuss recent trends in topics at the intersection of online learning and economics. We welcome submissions that explore this space along various directions, including (but not limited) to:

Learning in repeated auctions
Learning for pricing and bilateral trade
Learning in mechanism/contract/information design
Learning in markets
No-regret learning and convergence to equilibria

Submission Guidelines

Submissions should be made via https://ole2025.hotcrp.com/
The preferred format is a 2-page abstract. Longer submissions are welcome, but only the first two pages are guaranteed to be reviewed.
Submissions will be evaluated based on relevance to the workshop, academic quality, and potential impact.

This is a non-archival workshop. We encourage the submission of work that has been recently published, is under review, or is in progress. Submissions need not be anonymized and authors are encouraged to point to extended versions of their submissions available on public repositories.

Presentation Format

Authors of accepted submissions will be invited to present their work during a poster session at the workshop.

Poster size: We will be able to accommodate 36"x48" posters.

Accepted posters

Session 1

Learning Safe Strategies For Value Maximizing Buyers in Uniform Price Auctions (N. Golrezaei, S. Sahoo)
Online Bayesian Persuasion Without a Clue (F. Bacchiocchi, M. Bollini, M. Castiglioni, A. Marchesi, N. Gatti)
The Illusion of Collusion: How Bandit Algorithm Choice Affects Competition (C. Douglas, F. Provost, A. Sundararajan)
Robust Bandit Learning with Delegation (T. Sornwanee)
Optimal Pricing for Bundles: Using Submodularity in Offline and Online Settings (A. Tajdini, L. Jain, K. Jamieson)
Bayes correlated equilibria, no-regret dynamics in Bayesian games, and the price of anarchy (K. Fujii)
Learning to Incentivize in Repeated Principal-Agent Problems with Adversarial Agent Arrivals (J. Liu, A. Maiti, A. Tajdini, K. Jamieson, L. Ratliff)
Iterative Network Pricing for Ridesharing Platforms (C. Yu, H. Ma)

Session 2

No-Regret Incentive-Compatible Online Learning under Exact Truthfulness with Non-Myopic Experts (J. Komiyama, N. Mehta, A. Mortazavi)
Heterogeneous Multi-Agent Bandits with Parsimonious Hints (A. Mirfakhar, X. Wang, J. Zuo, Y. Zick, M. Hajiesmaili)
Algorithmic Risk Aversion and Recommendation-Mediated Demand (A. Haupt, A. Narayanan)
Online Learning for Dynamic Service Mode Control (W. Xing, Y. Hu, A. Kalvit, V. Sarhangian)
Robust Fair Allocation with a Single Sample (C. Kroer, R. Kumar, Z. Yang)
Nearly Tight Regret Bounds for Profit Maximization in Bilateral Trade (S. Di Gregorio, P. Duetting, F. Fusco, C. Schwiegelshohn)
Are Bounded Contracts Learnable and Approximately Optimal? (Y. Chen, Z. Chen, X. Deng, Z. Huang)
Product Marketing Management on the E-Commerce Platform (Z. Chen, Q. Yao, Y. Zhu, N. Li, Y. Chen, Z. Zhang, C. Yu, J. Xu, B. Zheng, X. Deng)
Finite-Sample Guarantees for Best-Response Learning Dynamics in Zero-Sum Matrix Games (F. Z. Faizal, A. Ozdaglar, M. J. Wainwright)

Organizers

Matteo Castiglioni

(PoliMi)

Andrea Celli (Bocconi)

Federico Fusco (Sapienza)

Tom Cesari

(uOttawa)

Contact

ole.ec.2025@gmail.com

Page updated

Google Sites

Report abuse