Workshop on Online Learning and Optimization 2025
November 10, 2025
RIKEN AIP, Tokyo, Japan
November 10, 2025
RIKEN AIP, Tokyo, Japan
Date: November 10, 2025 (JST) 9:30-19:30
Format: Hybrid (In-person + Online)
In-person registration is now open! Book now (Registration deadline: Nov. 3, Mon.) We closed the in-person registration.
Online registration is now open! Register via Doorkeeper.
Online learning
Statistical learning
Reinforcement learning
Bandit algorithms
Combinatorial optimization
Online convex optimization
Bayesian optimization
Graph algorithms and graph mining
Causal inference and counterfactual reasoning
Privacy-preserving and communication-efficient learning
Federated and decentralized systems
Applications in science and real-world domains
Nicolò Cesa-Bianchi (University of Milan / Politecnico di Milano, Italy)
Title: Trades, tariffs, and regret: Online Learning in Digital Markets
Abstract: Online learning explores algorithms that acquire knowledge sequentially, through repeated interactions with an unknown environment. The general goal is to understand how fast an agent can learn based on the information received from the environment. Digital markets, with their complex ecosystems of algorithmic agents, offer a rich landscape of sequential decision-making problems, characterized by diverse decision spaces, utility functions, and feedback mechanisms. This talk will demonstrate how tackling challenges within digital markets has not only advanced our understanding of machine learning capabilities but also revealed novel insights into algorithmic efficiency and decision-making under uncertainty.
Nishant Mehta (University of Victoria)
Title: Elicitation Meets Online Learning: Games of Prediction with Advice from Self-Interested Experts
Abstract: The classical game of prediction with expert advice involves two players: Decision Maker, who forecasts outcomes based on expert advice, and an adversarial Nature that selects the experts’ forecasts of outcomes and the outcomes themselves. The experts' forecasts are taken at face value: various benchmarks like external regret and swap regret are based on the performance of these forecasts. Yet, real-world experts may have beliefs about the outcomes they forecast. If not properly incentivized, self-interested experts can fail to report their beliefs truthfully, compromising benchmarks based on the experts’ beliefs. A series of recent works have developed online learning algorithms that succeed in the face of such self-interested experts, drawing from past results in online learning but also giving online learning both new results and new understanding. This talk will begin with a tour of fundamental mechanisms for eliciting experts’ beliefs. It will then cover recent progress in games of prediction with advice from self-interested experts, highlighting many open problems along the way.
09:30 – 10:00 Registration / Opening
10:00 – 11:00 Keynote Talk
Nicolò Cesa-Bianchi (University of Milan / Politecnico di Milano, Italy)
Trades, Tariffs, and Regret: Online Learning in Digital Markets
11:00 – 11:20 Coffee Break
11:20 – 11:50 Kohei Hatano (Kyushu University / RIKEN AIP, Japan)
Online Optimization over RIS Networks via Mixed Integer Programming
11:50 – 12:20 Kyoungseok Jang (Chung-Ang University, Korea)
Exploring Exploration Strategies in Reinforcement Learning
12:20 – 14:00 Lunch Break (on your own)
14:00 – 15:00 Keynote Talk
Nishant Mehta (University of Victoria, Canada)
Elicitation Meets Online Learning: Games of Prediction with Advice from Self-Interested Experts
15:00 – 15:15 Coffee Break
15:15 – 15:45 Junya Honda (Kyoto University / RIKEN AIP, Japan)
Recent Advances in Follow-the-Perturbed-Leader for Bandit Problems
15:45 – 16:15 Yuko Kuroki (CENTAI Institute S.p.A., Italy)
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
16:15 – 16:30 Coffee Break
16:30 – 17:00 Daiki Suehiro (Kyushu University / RIKEN AIP, Japan)
Online Combinatorial Optimization for Sequential Data Sampling in Neural Networks
17:00 – 17:30 Kaito Fujii (NII, Japan)
Bayes correlated equilibria and no-regret dynamics
17:30 – 19:30 Closing Remarks/Reception
Informal Discussion and Networking
Kohei Hatano (Kyushu University/ RIKEN)
Title: Online Optimization over RIS networks via Mixed Integer Programming
Abstract: We consider an online optimization problem motivated by reconfigurable intelligent surfaces (RIS), which is one of the core technologies of 6G wireless networks. We formulate the problem as an online optimization of paths over DAGs with non-linear reward functions. We show that the corresponding offline problem is NP-hard, but can be reformulated as an MIP. This result implies an online algorithm with a low regret.
Junya Honda (Kyoto University/ RIKEN)
Title: Recent Advances in Follow-the-Perturbed-Leader for Bandit Problems
Abstract: This talk presents recent theoretical and practical advances in Follow-the-Perturbed-Leader (FTPL) policies for bandit problems. FTPL is a conceptually simple online learning algorithm using random perturbations, avoiding the explicit optimization required by Follow-the-Regularized-Leader (FTRL) approaches. We begin by introducing the fundamental results that FTPL with various Fréchet-type distributions achieves Best-of-Both-Worlds (BOBW) regret guarantees in multi-armed bandit problems, along with some negative results on its fundamental limitations. We then discuss loss estimation in FTPL as a computational bottleneck, and introduce techniques to mitigate it. From a practical viewpoint, a key motivation for studying FTPL has been developing efficient algorithms for more complex settings like combinatorial bandits. Towards this goal, we finally present a recent result achieving optimal adversarial regret for size-invariant semi-bandits.
Kaito Fujii (NII)
Title: Bayes correlated equilibria and no-regret dynamics
Abstract: In this talk, I will examine regret notions and the computational tractability of correlated equilibria in Bayesian games. Unlike the complete-information case, the Bayesian setting admits several natural yet non-equivalent correlated-equilibrium concepts (Forges, 1993). I will mainly focus on communication equilibrium (Myerson, 1982), which combines truth-telling incentive constraints from mechanism design and obedience incentive constraints from correlated equilibria, and its associated notion of untruthful swap regret. We present an efficient algorithm for minimizing untruthful swap regret, and also provide an information-theoretic lower bound.
Kyoungseok Jang (Chung-Ang University)
Title: Exploring Exploration Strategies in Reinforcement Learning
Abstract: Reinforcement learning (RL) focuses on achieving efficient learning and optimal decision-making from available trials. Recent breakthroughs such as ChatGPT, robotics, autonomous driving, and recommendation systems owe much to advancements in reinforcement learning. Reinforcement learning is often framed as the ‘exploration vs. exploitation’ dilemma. In each trial, the learning agent must decide between ‘exploring’ to discover new possible outcomes or ‘exploiting’ by choosing familiar actions that yield reliable rewards. Effective exploration is crucial to enabling the agent to understand its environment with fewer trials, thereby saving trial opportunities for exploitation, which ultimately maximizes cumulative reward. In this talk, we will delve into a deeper understanding of efficient exploration through two RL variants: the bandit problem and best-arm identification. Throughout the series of new results, we will discuss how to address the two key aspects of exploration research: the design of experiments and the stopping condition for exploration.
Yuko Kuroki (CENTAI Institute S.p.A.)
Title: Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
Abstract: We study the problem of minimizing polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information. Unlike prior work that assumes a static setting with full knowledge of users' innate opinions, we address the more realistic online setting where innate opinions are unknown and must be learned through sequential observations. This novel setting, which naturally mirrors periodic interventions on social media platforms, is formulated as a regret minimization problem, establishing a key connection between algorithmic interventions on social media platforms and theory of multi-armed bandits. In our formulation, a learner observes only a scalar feedback of the overall polarization and disagreement after an intervention. For this novel bandit problem, we propose a two-stage algorithm based on low-rank matrix bandits. The algorithm first performs subspace estimation to identify an underlying low-dimensional structure, and then employs a linear bandit algorithm within the compact dimensional representation derived from the estimated subspace. We prove that our algorithm achieves the sublinear cumulative regret over any time horizon T. Empirical results validate that our algorithm significantly outperforms a linear bandit baseline in terms of both cumulative regret and running time.
Daiki Suehiro (Kyushu University / RIKEN AIP)
Title: Online Combinatorial Optimization for Sequential Data Sampling in Neural Networks
Abstract: This talk introduces an online combinatorial optimization framework for data sampling under noisy labels. Specifically, we formulate the task of selecting a k subset of samples from n total samples at each training epoch as an online k-set problem, where the feedback is given by the behavior of the neural network on each sample. Although the network’s responses to data fluctuate highly during training, the proposed sampling strategy based on no-regret online convex optimization enables stable and theoretically supported sample selection. Experimental results demonstrate that the proposed method achieves both higher accuracy and lower computational cost compared with existing sample-selection approaches.
Shinji Ito (The University of Tokyo / RIKEN)
Junya Honda (Kyoto University/ RIKEN)
Kohei Hatano (Kyushu University/ RIKEN)
Yuko Kuroki (CENTAI Institute S.p.A.)
Sequential Decision Making Team,
RIKEN Center for Advanced Intelligence Project (AIP)
Computational Learning Theory Team
RIKEN Center for Advanced Intelligence Project (AIP)
JSPS KAKENHI Grant-in-Aid for Scientific Research (B)
"Fundamental Technologies for Robust Dynamic Decision-Making Policies with Optimality in Diverse Environments"
JST PRESTO (AI and Robotics for Innovation in Research and Development Process)
"Dynamic Environment Analysis and Its Applications Using Sequential Learning Theory and Graph Mining Techniques"