CS 282r

Topics in Machine Learning: Advancements in Probabilistic Machine Learning, Programming Models for ML, and Causality

The goal of the course is to give students broad exposure to a diverse set of active research topics in machine learning by discussing recent papers across a range of advanced subjects. Then students can select one of these subjects to explore in-depth in the form of a course project. After the first lecture, each week two students will each present on the chosen theme of that week, using a couple of papers as reference. Ideally the presentations will motivate the area, put the work in context and describe the innovations from the papers. Then we will discuss the work as a group. One instructor will help facilitate the discussion and provide guidance about the content of the presentations.

Course Instructors

Alex D'Amour, Alex Wiltschko, D. Sculley, David Parkes, David Belanger, Dougal Maclaurin, Jasper Snoek

Please contact us via Piazza.


Location

We will hold the first class at Harvard

33 Oxford St

Maxwell Dworkin

Ground Floor, MD G115 from 9-10:45am, Friday, February 1.

and the remainder will be held at the Google offices at

355 Main St, Floor 5

Cambridge MA, 02142


Grading

Class participation - 30%

Class presentations - 20%

Project proposal - 10% - 3/15

Project presentation - 10% - 4/19

Project report and code - 30% - 5/3


In-class discussions

Each class meeting will be three hours of in-depth discussion on a specific topic. Two students will present papers each week, and each student is expected to facilitate a discussion 1-2 times per semester. The presenters for each week are expected to coordinate with each other and with the course instructors in advance to divide up the assigned papers and any additional background material that will need to be discussed.

Discussions will center around:

  • Understanding the strengths and weaknesses of these methods.
  • Understanding the relationships between these methods, and with previous approaches.
  • Extensions or applications of these methods.
  • Experiments that might better illuminate their properties.
  • The ultimate goal is that these discussions will reveal gaps in current research, generate new ideas, and ideally generate novel research directions.


Final Project

Students can work on projects individually or in pairs. The goal of the projects is to allow students to dive more deeply into one of the topics of the course. The project can be an extension of existing work, a novel application using existing methods, exploration of a new research idea or non-trivial implementation and experimentation using existing methods. The grade will depend on the ideas, how well you present them in the report, how clearly you position your work relative to existing literature, how illuminating your experiments are, and how well-supported your conclusions are.

Each group of students will write a short (around 2 pages) research project proposal, which ideally will be structured similarly to a standard paper. It should include a description of a minimum viable project, some nice-to-haves if time allows, and a short review of related work. You don't have to do what your project proposal says - the point of the proposal is mainly to have a plan and to make it easy for me to give you feedback.

Towards the end of the course, everyone will present their project in a short presentation.

At the end of the class you'll hand in a project report (around 4 to 8 pages), prepared in the format of a machine learning conference paper such as NeurIPS or ICML. Note, we do not expect the report to be a completed research paper of that caliber but hopefully some projects will be a first step in that direction.


Statement of Interest and Qualifications

Unfortunately, we expect this course will be oversubscribed and we must prioritize who can attend. As such, we will prioritize graduate students with relevant research interests, but are open to other qualified and interested candidates. As an advanced course where we will discuss recent literature at a technical level, we do expect significant math, statistics and machine learning background. We ask students prepare a one paragraph statement indicating their relevant background (courses taken, etc.) and their level of interest in the course (if we open a slot, will you take it?). Please submit this by 2pm on Friday, Feb 1 using Piazza. We will let students know who have been selected by 5pm Friday Feb 1.


Collaboration policy:

For course presentations, the presentations can be completed independently but we encourage students to collaborate to ensure there is less redundancy and overlap (e.g. by having similar introductions to a topic). However, we expect both students to each contribute substantially to the week's presentations. For the project, if students choose to work together, we will ask for a statement detailing the individual contributions of each student.

Schedule

2/1, 9am - 10:45am: Organization/Intro. Location: Harvard (Maxwell Dworkin G115)

TOPIC 1: Advanced Topics in Probabilistic Machine Learning

2/8, 9am - 12pm: Decision Making under Uncertainty with Deep Networks. Location: Google (355 Main St, Floor 5, Cambridge MA).

  • Riquelme et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
  • Osband and van Roy. Bootstrapped Thompson Sampling and Deep Exploration
  • Snoek et al. Scalable Bayesian Optimization Using Deep Neural Networks

2/15, 9am - 12pm: Meta-Learning. Location: Google (355 Main St, Floor 5, Cambridge MA). All remaining classes @ Google.

  • Finn et al. "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks"
  • Kim et al. "Bayesian Model-Agnostic Meta-Learning"
  • Garnelo et al. "Conditional Neural Processes"

2/22, 9am - 12pm: Exact-Likelihood Generative Models, Normalizing Flows, etc.

  • Grathwohl et al. FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
  • Kingma and Dharimal: Glow: Generative Flow with Invertible 1x1 Convolutions
  • Louizos and Welling: Multiplicative Normalizing Flows for Variational Bayesian Neural Networks
  • Dinh et al. Density estimation using Real NVP

3/1, 9am - 12pm: Bayesian Inference in Function Space for Neural Networks

  • Lee et al. "Deep Neural Networks as Gaussian Processes"
  • Sun et al. "Functional Variational Bayesian Neural Networks"
  • (Last 20 mins) Preview: Short overview of causal formalism.

TOPIC 2: Causal Inference

3/8, 9am - 12pm: Causal Inference with Unconfoundedness

  • Nie and Wager: "Quasi-Oracle Estimation of Heterogeneous Treatment Effects"
  • Shalit et al: "Estimating individual treatment effect: generalization bounds and algorithms"
  • Liu et al: "Representation Balancing MDPs for Off-Policy Policy Evaluation"
  • (Optional) Chernozhukov et al: Double/debiased machine learning for treatment and structural parameters
  • (Optional) Hahn et al: "Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects"
  • (Optional) Johansson et al: Learning Weighted Representations for Generalization Across Designs
  • (Background) Imbens and Rubin: Causal inference : for statistics, social, and biomedical sciences : an introduction (Chapters 1 and 12).
  • (Background) Pearl: Causality (Chapter 3).
  • (Background) Huszár: "ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus" (tl;dr intro to Pearl).
  • (Background) Hernán and Robins: Causal Inference (Part I, especially Chapters 1, 2, 3, 6, 7, 10).
  • (Background) Peters et al: Elements of Causal Inference (Chapters 3, 5, 6).

3/15, 9am - 12pm: Causal Inference with Unobserved Confounding (and Project Proposals Due)

  • Miao et al: "Identifying Causal Effects With Proxy Variables of an Unmeasured Confounder"
  • Louizos et al: "Causal Effect Inference with Deep Latent-Variable Models"
  • D'Amour: On Multi-Cause Causal Inference with Unobserved Confounding: Counterexamples, Impossibility, and Alternatives
  • (Optional) Wang and Blei: "The Blessings of Multiple Causes" (NeurIPS talk)
  • (Optional) Shi et al: "Multiply Robust Causal Inference With Double Negative Control Adjustment for Unmeasured Confounding"
  • (Optional) Train and Blei: Implicit Causal Models for Genome-wide Association Studies
  • (Optional) Rangananth and Perotte: Multiple Causal Inference with Latent Confounding

TOPIC 3: Systems and Languages for ML

3/29, 9am - 12pm: Probabilistic Programming

  • Goodman et al. "Church: a language for generative models"
  • Carpenter et al. "Stan: A Probabilistic Programming Language"
  • Pfeffer "Figaro: An Object-Oriented Probabilistic Programming Language"
  • Milch et al. "BLOG: Probabilistic Models with Unknown Objects"
  • Lunn et al. "WinBUGS – A Bayesian modelling framework: Concepts, structure, and extensibility"

4/5, 9am - 12pm: Programs as Models

Motivation: Shane Legg's Phd Thesis, "Machine Super Intelligence", Chapter 2

  • Liang et al., "Learning Programs: A Hierarchical Bayesian Approach"
  • Lake et al., "Human-level concept learning through probabilistic program induction"
  • Duvenaud et al., "Structure Discovery in Nonparametric Regression through Compositional Kernel Search"
  • Bosnjak et al., "Programming with a Differentiable Forth Interpreter"
  • Reed et al., "Neural Programmer-Interpreters"

4/12, 9am - 12pm: Automatic Differentiation: Methods & Tricks

  • SCT vs OO or Autograd is a library not a generic term.
    • Baydin et al., 2018. “Automatic Differentiation in Machine Learning: a survey.”
    • (Optional) van Merrienboer et al., 2018. "Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming"
  • Fixed-point trick, and other specialized adjoints.
    • Taftaf et al., “Adjoints of Fixed-Point Iterations”.
    • (Optional, this is the really original paper, but harder to parse). Christianson, 1994. "Reverse accumulation and attractive fixed points"
    • Modern flavor: Chen et al., 2018. “Neural Ordinary Differential Equations”
  • Binomial checkpointing.
    • Original paper: Griewank & Walther, 2009. .”Algorithm 99: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation”
    • Rediscovery: Gruslys et al., 2016. “Memory-efficient backpropagation through time”.

4/19, 9am - 12pm: Project Presentations

4/26, 9am - 12pm: Project Presentations

5/3 Final Projects Due (Reading Period)

5/10 No Meeting (Exam Period)

Final Project Guidelines


A project can contribute in any of these areas (or a combination of them):

  • Methods: systematic assessment of the strengths and weaknesses of a collection of novel or existing methods when applied to real or synthetic data.
  • Applications: use of machine learning to help solve a real-world problem.
  • Theory: formal statements concerning guarantees about machine learning problems or methods.
  • Exposition: presentation of a unified framework covering a set of existing theory or methods. The goal is to help provide accessible educational content to others as well as identify opportunities for development of novel methods and theory.
  • Software: development of machine learning tools that are fast, general-purpose, and well-tested.


When evaluating your projects, we will be focusing on the following criteria:

  • Are your technical statements precise and correct?
  • Did you properly cite related work and explain the background concepts?
  • Given your specific machine learning background, did the work stretch you outside of your comfort zone?
  • Is your write-up well-written and was your presentation engaging? If your project is software-based, was your code high-quality and reusable?
  • If working as part of a team, did you collaborate effectively?


Note that projects do not necessarily need to focus specifically on the subjects discussed in the class, though should be relevant to recent advances in machine learning.


Some example projects

  • Implement a few MCMC methods and compare their performance on a variety of low-dimensional inference problems as well as for Bayesian deep nets. Here, it may be best to use synthetic data, so that properties of the problem can be tuned.
  • Apply Thompson sampling or GP-based Bayesian optimization to a black-box optimization problem in biology. In the interest of time/money, it could be run on a software-defined fitness function, rather than by doing actual experiments.
  • Derive regret bounds for various bandit algorithms.
  • Write a tutorial covering the breadth of recent advancements in variational inference.
  • Implement your own autodiff library from scratch, or contribute new features or example applications to jax . For example, you could develop a user-friendly library for MCMC in jax. This blog post is an excellent example of such a project.