EC'26 Workshop

Online Learning and Economics

Second Edition

Monday, July 6th in Rome, Italy

Summary

The workshop focuses on the intersection of online learning and economics, exploring how learning algorithms are increasingly used for decision-making in strategic economic settings such as markets and platforms. The event brings together researchers from diverse fields to discuss current challenges, share recent advances, and foster collaboration. The scope of the workshop extends beyond online learning to encompass other relevant domains of machine learning, including, for example, learning theory and reinforcement learning.

Website to the First Edition (link)

Program - To Be Confirmed

09:00 – 10:30 First Session of Invited Talks - Room Bernardino da Siena
- 9:00 - 9:30 Nicolò Cesa-Bianchi
- 9:30 - 10:00 Yurong Chen
- 10:00 - 10:30 Vianney Perchet
10:30 – 11:00 Coffee Break
11:00 – 12:30 Second Session of Invited Talks - Room Bernardino da Siena
- 11:00 - 11:30 Yang Cai
- 11:30 - 12:00 Chara Podimata
- 12:00 - 12:30 Gabriele Farina
12:30 - 14:00 Lunch / Poster session

For details on which posters are assigned to each session, please refer to the Accepted Posters section.

Keynote speakers

Chara Podimata (MIT)

Calibrated Stackelberg Games: Learning Optimal Commitments Against Calibrated Agents

Abstract

In this talk, I will introduce a generalization of the standard Stackelberg Games (SGs) framework: Calibrated Stackelberg Games (CSGs). In CSGs, a principal repeatedly interacts with an agent who (contrary to standard SGs) does not have direct access to the principal's action but instead best-responds to calibrated forecasts about it. CSG is a powerful modeling tool that goes beyond assuming that agents use ad hoc and highly specified algorithms for interacting in strategic settings and thus more robustly addresses real-life applications that SGs were originally intended to capture. Along with CSGs, we also introduce a stronger notion of calibration, termed adaptive calibration, that provides fine-grained any-time calibration guarantees against adversarial sequences. We give a general approach for obtaining adaptive calibration algorithms and specialize them for finite CSGs. In our main technical result, we show that in CSGs, the principal can achieve utility that converges to the optimum Stackelberg value of the game both in finite and continuous settings, and that no higher utility is achievable. Two prominent and immediate applications of our results are the settings of learning in Stackelberg Security Games and strategic classification, both against calibrated agents.

Gabriele Farina (MIT)

How Strong a Notion of Rationality Can We Learn Efficiently in Multi-Agent Settings?

Abstract

A central question in multi-agent learning is: how strong a notion of rational behavior can be guaranteed through efficient no-regret learning? Classical results show that minimizing regret leads to coarse correlated equilibria, and enriching the class of allowed deviations yields progressively stronger equilibrium concepts. But how far can this hierarchy be pushed while retaining efficient algorithms in general convex and extensive-form games? In this talk, I present recent results that characterize the strongest notions of rationality that can be efficiently learned in this broad setting. We show that linear and low-degree polynomial correlated equilibria—strictly stronger than coarse correlated equilibria and natural relaxations of correlated equilibria—can be computed and learned efficiently even in general convex and extensive-form games. En route to the result, we will introduce new algorithmic tools with broader applicability. First, the natural deviation sets underlying stronger notions of regret do not admit efficient separation or optimization oracles. To address this, we introduce a new algorithmic primitive, semiseparation, which enables regret minimization over convex sets that lack classical separation oracles. Second, our analysis leverages a fast computational version of von Neumann’s minimax theorem, yielding convergence rates that scale logarithmically in the desired accuracy. This tool has further applications, including to variational inequalities, fixed-point computation, and online multicalibration.

Nicolò Cesa-Bianchi (Università degli Studi di Milano)

Trading under constraints: Periodic strategies for blocking bandits

Abstract

In automated financial markets, trading algorithms face structural bottlenecks: quantitative funds encounter capital lockups during venue settlement cycles, while market makers navigate inventory risk via mandatory cool-down windows. Both scenarios constrain continuous execution by temporarily blocking an asset or venue from being immediately re-selected. We formalize these operational limits using the adversarial blocking bandit framework, where playing an arm renders it unavailable for a fixed number number of future rounds.

We first show that computing the optimal unconstrained dynamic policy in an adversarial market environments is NP-hard. To establish a tractable alternative, we turn to d-periodic policies, which cleanly map to cyclic capital-rotation and asset-allocation schedules. We show that the optimal periodic policy is efficiently computable via a reduction to maximum-weight bipartite matchings and captures at least a 1/K fraction of the dynamic optimum. Our main result shows that T^{2/3} is (up to log factors) the minimax rate for the regret (against periodic policies) for adversarial blocking bandits with identical blocking times, and that this rate is achievable by an efficient algorithm.

Vianney Perchet (Crest, ENSAE & Criteo AI Lab)

Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback

Abstract

In this talk, I will introduce the problem of learning in zero-sum game, and especially for the problem of "last-iterate" convergence, unlike the traditional literature that looks at the average convergence (we argue it makes more sense). The interesting property is that the optimal rate is T^{-1/4} which is quite unusual (and unexpected) in this literature.

https://proceedings.mlr.press/v267/fiegel25a.html

Yang Cai (Yale University)

Proximal Regret and Proximal Correlated Equilibria

Abstract

A central question in online learning is which regret notions beyond external regret can be minimized efficiently by simple algorithms. This talk introduces proximal regret, a new notion for online convex optimization that lies between external regret and full swap regret. Defined through proximal-operator deviations, proximal regret captures a rich but tractable class of strategy modifications. The main result is that plain gradient descent already minimizes this stronger regret notion. In games, this yields convergence to proximal correlated equilibria, a sharper refinement of coarse correlated equilibria that remains learnable via gradient-descent dynamics. We will also discuss extensions to mirror descent and optimistic gradient methods, and applications showing how the additional constraints imposed by gradient-descent dynamics improve equilibrium selection.

Yurong Chen (INRIA Paris)

Learning a Stackelberg Leader's Incentives from Optimal Commitments

Abstract

Stackelberg equilibria, as functions of the players' payoffs, can inversely reveal information about the players' incentives. In this paper, we study to what extent one can learn about the leader's incentives by actively querying the leader's optimal commitments against strategically designed followers. We show that, by using polynomially many queries and operations, one can learn a payoff function that is strategically equivalent to the leader's, in the sense that: 1) it preserves the leader's preference over almost all strategy profiles; and 2) it preserves the set of all possible (strong) Stackelberg equilibria the leader may engage in, considering all possible follower types. As an application, we show that the information acquired by our algorithm is sufficient for a follower to induce the best possible Stackelberg equilibrium by imitating a different follower type. To the best of our knowledge, we are the first to demonstrate that this is possible without knowing the leader's payoffs beforehand. Due caution is necessary when one intends to utilize the power of optimal commitment. This is a joint work with Xiaotie Deng (Peking University), Jiarui Gan (Uiversity of Oxford), and Yuhao Li (Columbia University).

Call for posters

Important Dates (All times are 11:59 PM AoE)

Submission Deadline: 31/5/2026
Notification of Acceptance: 5/6/2026 9/6/2026
Workshop Date: July 6, 2025

OLE 2026 aims to provide a venue for researchers to explore and discuss recent trends in topics at the intersection of online learning and economics. We welcome submissions that explore this space along various directions, including (but not limited) to:

Learning in repeated auctions
Learning in mechanism/contract/information design
Learning in markets
No-regret learning and convergence to equilibria
(Online) Calibration

Submission Guidelines

Submission Platform: https://ole2026.hotcrp.com/
The preferred format is a 2-page abstract. Longer submissions are welcome, but only the first two pages are guaranteed to be reviewed.
Submissions will be evaluated based on relevance to the workshop, academic quality, and potential impact.

This is a non-archival workshop. We encourage the submission of work that has been recently published, is under review, or is in progress. Submissions need not be anonymized and authors are encouraged to point to extended versions of their submissions available on public repositories.

Presentation Format

Authors of accepted submissions will be invited to present their work during a poster session at the workshop.

NEW: The venue confirms that their vertical poster stands are of size 100x180 cm. This means that they can fit a vertical A0 or an horizontal A1 each. The poster slots are contiguous, meaning that it is not possible to accommodate posters exceeding the 100x180 dimensions.

Accepted posters

First Session - 12:30 to 13:15

Failure Modes in AI Retraining Dynamics (K. Banihashem, N. Collina, N. Immorlica, B. Lucier, A. Slivkins)
No-Regret Online Autobidding in Non-Truthful Auctions with ROI and Budget Constraints (Y. Deng, Y. Li, W. Tang, H. Zhang)
Learning vs. Optimizing Bidders in Budgeted Auctions (G. Fikioris, B. Sivan, E. Tardos)
The Computable but Not Learnable Information-Value-Free Equilibria and Regulation of Algorithmic Collusion (J. Hartline, C. Wang, C. Zhang)
MenuNet: A Strategy-Proof Neural Mechanism for Matching Markets (Z. Sun)
Bandit Social Learning with Exploration Episodes (K. Banihashem, N. Collina, A. Slivkins)
Learn to Match: Two-Sided Matching with Temporally Extended Feedback (H. Zong, Y. Liang, B. Zhou, N. Jaques)
Partner Choice in Low-Information Social Dilemmas (S. Roesch, Y. Du, O. Rodrigues, S. Leonardos)
Learning in Bayesian Stackelberg Games With Unknown Follower’s Types (F. Bacchiocchi, M. Bollini, M. Castiglioni, A. Marchesi, S. Coutts)
Equilibrium with Internal Transfers (M. Liu, G. Farina, A. Ozdaglar)
Blackwell Approachability and Gradient Equilibrium are Equivalent (B. W. Lee, N. Haghtalab, M. I. Jordan, R. J. Tibshirani)
Ex-post equilibria (F. Giordano, J. Grand-Clément, C. Kroer)
Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games (B. H. Zhang, I. Anagnostides, T. Sandholm)
Online Learning and Equilibrium Computation with Ranking Feedback (M. Liu, Y. Chen, Z. Fan, G. Farina, A. E. Ozdaglar, K. Zhang)
Smoothing the Cliff: Incentive-Compatible Priority Allocation via Randomized Mechanisms (T. Lin, S. Yu, H. Zhang)

Second Session - 13:15 to 14:00

Multi-agent Adaptive Mechanism Design (Q. Han, D. Simchi-Levi, R. Tan, Z. Zhao)
Searching for Optimal Prices in Two-Sided Markets (Y. Feng, M. Ma, B. Peng, Z. Wan)
Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization (R. Niazadeh, N. Golrezaei, J. Wang, F. Susan, A. Badanidiyuru)
Toward Simultaneously Optimal Regret in U-Calibration (R. Frongillo, H. Luo, N. Mehta, J. Schneider)
Contracting the misguided agent (J. Tłuczek, E. Yılmaz, V. Villin, C. Dimitrakakis)
Understanding Strategic Platform Entry and Seller Exploration: A Stackelberg Model (G. Seo, X. Wang, D. C. Parkes)
When Leaderboards Stop Search: Feedback Precision and Costly Exploration (K. Chen, Y. Lin)
Swap Regret Minimization Through Response-Based Approachability (I. Anagnostides, G. Farina, M. Fishelson, H. Luo, J. Schneider)
Learning a Game by Paying the Agents (B. H. Zhang, T. Lin, Y. Chen, T. Sandholm)
Consumer Search and Social Learning in Agentic Markets (B. Lucier, N. Immorlica, M. Mobius, A. Slivkins, D. Goldstein, J. Hofman, S. Jaffe, D. Rothschild)
Do Not Trust the Auctioneer: Learning to Bid in Feedback-Manipulated Auctions (L. Foscari, M. Tullii, V. Perchet)
Learning to Bargain: Last-Iterate Convergence of Follow-the-Regularized-Leader in Games with a Discontinuity (S. Kamp, R. Liebman, B. Fish)
Dynamic Pricing and Advertising with Demand Learning (S. Agrawal, Y. Feng, W. Tang)
Contextual Search in Principal-Agent Games: The Curse of Degeneracy (Y. Feng, M. Ma, B. Peng, Z. Wan)
Robust Learning with Private Information (K. Okumura)
Profit Maximization in Bilateral Trade against a Smooth Adversary (S. Di Gregorio, P. Duetting, F. Fusco, C. Schwiegelshohn)

Organizers

Matteo Castiglioni

(PoliMi)

Andrea Celli (Bocconi)

Tom Cesari

(uOttawa)

Federico Fusco (Sapienza)

Contacts

ole.ec.2026@gmail.com

Page updated

Report abuse