Viewing recommendations as a decision-making problem is core to the CONSEQUENCES workshop series, and the bandit paradigm is a primary framework for tackling such challenges. While bandits offer a direct way to optimize the metrics that matter, the literature can be overwhelming to practitioners. As the number of available algorithms is vast, the real challenge is assessing an algorithm's practical viability—a task that depends on critical design decisions (from evaluation and monitoring strategies, to ensuring a consistent user experience in exploration traffic) often overlooked by the research literature.
In this 50-minute tutorial, we distill years of hands-on experience from practitioners at Aampe, Amazon, Booking.com, Netflix, ShareChat, and Spotify into a practical guide. We will specifically focus on off-policy bandits, a family of methods designed to accommodate learning and evaluation from logged bandit feedback, which has gained popularity in industry due to its scalability, safety and extensibility. By showcasing real-world applications, we will share our extensive practical learnings about the problems that arise from the messy realities of production environments and live user interactions. Attendees will leave with a clearer understanding of how to apply the bandit paradigm to build practical, impactful recommender systems.
This is a follow-up on previous tutorials presented at WWW '23 and WSDM '24. For more information, see: https://sites.google.com/view/practical-bandits-tutorial