PAC-Bayesian Analysis and Its Applications

ECML-PKDD-2012 Tutorial
Friday 28th September, Morning Session
Bristol, UK

Abstract

PAC-Bayesian analysis is a basic and very general tool for data-dependent analysis in machine learning. By now, it has been applied in such diverse areas as supervised learning, unsupervised learning, and reinforcement learning, leading to state-of-the-art algorithms and accompanying generalization bounds. PAC-Bayesian analysis, in a sense, takes the best out of Bayesian methods and PAC learning and puts it together: (1) it provides an easy way to exploit prior knowledge (like Bayesian methods); (2) it provides strict and explicit generalization guarantees (like VC theory); and (3) it is data-dependent and provides an easy and strict way of exploiting benign conditions (like Rademacher complexities). In addition, PAC-Bayesian bounds directly lead to efficient learning algorithms.

We will start with a general introduction to PAC-Bayesian analysis, which should be accessible to an average student, who is familiar with machine learning at the basic level. Then, we will survey multiple forms of PAC-Bayesian bounds and their numerous applications in different fields (including supervised and unsupervised learning, finite and continuous domains, and the very recent extension to martingales and reinforcement learning). Some of these applications will be explained in more details, while others will be surveyed at a high level. We will also describe the relations and distinctions between PAC-Bayesian analysis, Bayesian learning, VC theory, and Rademacher complexities. We will discuss the role, value, and shortcomings of frequentist bounds that are inspired by Bayesian analysis.

This tutorial builds upon a related tutorial given at ICML-2012, which can be found here. While there will naturally be certain overlap in the material, certain aspects will be presented differently and a stronger stress on applications will be provided.

Target Audience

The talk will be targeted at a general audience: people that would like to learn about PAC-Bayesian analysis, understand it better, and apply it in their research. We will assume that the audience is familiar with the notions of a hypothesis space, loss function, Markov's and Hoeffding's inequalities, KL-divergence, and mutual information. No prior knowledge beyond these basics will be assumed. We aim at several goals at increasing difficulty level of what we would like our tutorial participants to learn. (1) The participants should learn, what PAC-Bayesian analysis is in general, what PAC-Bayesian bounds mean, where they can and cannot be applied, and what can and cannot be achieved with PAC-Bayesian analysis. (2) Participants should get an idea of how to derive a PAC-Bayesian bound for a standard application and how to derive a learning algorithm from a PAC-Bayesian bound. (3) Advanced listeners, who are already familiar with the PAC-Bayesian analysis, will enjoy the new organized and categorized view of this subject and, possibly, learn some new form or application of PAC-Bayesian analysis.

Tentative Outline

Part 1: Introduction (1 hour 10 min.)

In the first part we present the simplest form of PAC-Bayesian theorem, show two simple applications of this theorem in supervised learning, and discuss the relation to Bayesian learning.

  1. Introduction. Outline of the tutorial. Description of the PAC learning framework (Yevgeny, 5 min)
  2. Simple PAC-Bayesian theorem for supervised learning with i.i.d. samples (Yevgeny, 15 min)
  3. Application in a finite domain (co-clustering) (Yevgeny, 15 min)
  4. Application in a continuous domain (SVM) (John, 20 min)
  5. Relation between Bayesian learning and PAC-Bayesian analysis (John, 8 min)
  6. Learning the prior in PAC-Bayesian bounds (John, 7 min)
Break 15 min.

Part 2: Advanced PAC-Bayesian bounds and Applications (1 hour 5 min.)

In the second part we show multiple different forms that PAC-Bayesian theorems can take, tighter bounds for supervised learning, and additional applications in unsupervised and reinforcement learning.

  1. Advanced PAC-Bayesian bounds for supervised learning (François, 15 min)
  2.  Derivation of algorithms from PAC-Bayesian bounds (François, 10 min)
  3. Localized PAC-Bayesian bounds (François, 10 min)
  4. PAC-Bayesian bounds for unsupervised learning and density estimation (at a high level) (Yevgeny, 10 min)
  5. PAC-Bayes-Bernstein inequality for martingales and its applications in reinforcement learning (Yevgeny, 15 min)
  6. Tutorial summary (Yevgeny, 5 min)

Slides

Tutorial slides are available for download here.

Presenters

Yevgeny Seldin

Yevgeny Seldin

Research Scientist, Max Planck Institute for Intelligent Systems, Tübingen, Germany
Honorary Research Associate, University College London, UK

https://sites.google.com/site/yevgenyseldin

Francois Laviolette

François Laviolette

Professeur adjoint, Département d'informatique, Université Laval (Québec), Canada

http://www.ift.ulaval.ca/~laviolette

John Shawe-Taylor

John Shawe-Taylor

Head of Department of Computer Science, University College London, UK

http://www.cs.ucl.ac.uk/staff/j.shawe-taylor

Code

For implementation of prior learning technique in PAC-Bayesian analysis, please, see the web page of Emilio Parrado-Hernandez.