Syllabus

The class will review key assumptions and ideas making semi- and unsupervised learning possible, including

  • Low-density separation assumptions
  • Clustering assumptions
  • Generative modeling assumptions
  • Smoothness
  • Manifold assumptions
  • Existence of different 'views' on the data

and their implementation in learning algorithms

  • expectation maximization (EM) and its generalizations (e.g., for Bayesian methods)
  • co-training and multi-view training
  • bootstrapping techniques
  • graph-based algorithms (e.g., label propagation)
  • transductive SVMs
  • ...

as well as their applications to interesting problems in natural language processing:

  • induction of semantic representations
  • grammar induction (syntax)
  • topic modeling
  • ...

We will also look into related problems and techniques

  • latent variable models and partially labeled setting
  • multi-task learning
  • feedback instead of full supervision
  • domain shift and domain adaptation techniques