The Space of Online Learning Problems

ECML-PKDD Tutorial

11 September 2015

Porto, Portugal

Abstract

Nowadays the rate and volume of information flow are sharply increasing and forcing numerous domains to switch from offline to online information processing. Online learning is a subfield of machine learning providing the intelligence behind many online data processing systems. It has already revolutionized personalization and advertising on the Internet and rapidly penetrates many other domains.

The tutorial will start with an overview of the classical online learning problems - prediction with expert advice, adversarial multiarmed bandits, stochastic multiarmed bandits, and bandits with side information (Cesa-Bianchi and Lugosi, 2006, Bubeck and Cesa-Bianchi, 2012).

Then we will introduce the space of online learning problems (see the figure on the right). The space is spanned by three axes corresponding to three major parameters of online learning problems:

1. The feedback axis, measuring the amount of feedback obtained by the algorithm at every round. We will primarily focus on full information and limited (a.k.a. bandit) feedback.

2. The environmental resistance axis, where we primarily distinguish between i.i.d. (a.k.a. stochastic) and adversarial environments.

3. The structural complexity axis, measuring the structural complexity of a problem. We will primarily focus on stateless problems and problems with side information (state).

Most of the classical results in online learning correspond to isolated points in the space of online learning problems. For example, prediction with expert advice, adversarial multiarmed bandits, and stochastic multiarmed bandits correspond to the three red crosses on the figure (in clockwise direction). In the second part of the tutorial we will describe a number of new algorithms that solve ranges of problems in the space of online learning problems (illustrated by dashed lines on the figure). The results include algorithms that interpolate between full information and bandit feedback (Mannor and Shamir, 2011, Alon et al., 2013, Seldin et al., 2014, Kale, 2014); algorithms that interpolate between adversarial and i.i.d. (or in some other sense “sub-adversarial”) environments in the full information (Gaillard et al., 2014, Koolen et al., 2014, Wintenberger, 2015) and in the bandit setting (Bubeck and Slivkins, 2012, Seldin and Slivkins, 2014); and algorithms that interpolate between stateless bandits and bandits with side information (Seldin et al.,2011). These results open a new era in online learning research, where the researchers progress from studying isolated points in the space of online learning problems to studying ranges of problems.