[slides]
Advances in online bandits and reinforcement learning enable us to solve complex decision-making tasks in problems like games, where agents learn through many trials and errors. However, it is often difficult to directly apply these techniques to problems related to recommender and search systems, as trial and error can be costly in terms of effectiveness and implementation. Instead, counterfactual learning and evaluation are often used in such scenarios because they enable batch learning and evaluation of decision-making agents by leveraging logged data. While the lack of direct interaction with the real environment avoids undesirable consequences, it also requires in-depth statistical knowledge for reliable deployment.
This tutorial begins with an overview of the field, highlighting fundamental formulations and methods. The goal of this section is to provide the audience with a solid background to actively engage in the discussion during the workshop. I will then discuss some of the emerging problems in the field that may limit the use and effectiveness of relevant techniques, such as non-stationarity, reward alignment, short- and long-term gaps, the presence of new actions, dealing with languages, and deterministic logging. By exploring these challenges, I aim to outline future research directions in this area, emphasizing the need for novel ideas that can more readily and effectively handle the complexities of real-world data.
Yuta Saito is a Ph.D. candidate in the Department of Computer Science at Cornell University, advised by Prof. Thorsten Joachims. His research focuses on counterfactual learning and evaluation, fairness in recommender systems, and their applications to large-scale systems. His recent work has been published at top-tier conferences, including ICML, NeurIPS, KDD, SIGIR, RecSys, and WSDM. He won the Best Paper Runner-Up Award at WSDM '22, Outstanding Reviewer Award at RecSys'23, and co-lectured tutorials on counterfactual evaluation at RecSys '21 and KDD '22. Moreover, he has been collaborating with over 10 companies regarding the deployment of related methodologies and was named to Forbes 30 Under 30 in 2022 for his research and real-world applications.