Invited keynote

Unsupervised Domain Adaptation: From Practice to Theory

by John Blitzer, University of California, Berkeley
Unsupervised domain adaptation algorithms address a key issue in natural language processing:  How can we train a system on under a
labeled source distribution but achieve high performance on a target distribution for which we have only unlabeled data.  Many different
empirical solutions have been proposed for this problem, and many of them have found success across one or more natural language processing
tasks.  Nonetheless, we're holding this workshop because we still don't understand which technique to use when, or even whether any of
the current techniques are appropriate for a particular problem.  
Contrast this with supervised learning:  No one is surprised if, for a new sueprvised natural language processing problem, the best model is
a discriminative margin-based one like perceptron or its more recent cousins.

We maintain that this difference between supervised learning and unsupervised adaptation is due at least in part due to our comparative
lack of theoretical understanding of the conditions under which domain adaptation can perform well.  In this talk we'll examine some recent
theoretical results for unsupervised adaptation that are motivated by algorithms and settings from natural language processing.  We
formalize the (sometimes extremely strong) assumptions under which we can expect unsupervised domain adaptation algorithms to yield
high-performing target models, and we examine bounds on the generalization error of source-trained models under these assumptions.
 These bounds show how "representation learning" approaches to adaptation can achieve good, sometimes great results, and also give
some insights on the conditions under which these techniques can break down or even completely fail.  At the end of the talk, I will briefly
discuss connections to supervised adaptation in terms of these same assumptions and suggest some open questions for new research linking
theoretical and practical problems in domain adaptation.