by John Blitzer, University of California, Berkeley
Unsupervised domain adaptation algorithms address a key issue in natural language processing: How can we train a system on under a labeled source distribution but achieve high performance on a target distribution for which we have only unlabeled data. Many different empirical solutions have been proposed for this problem, and many of them have found success across one or more natural language processing tasks. Nonetheless, we're holding this workshop because we still don't understand which technique to use when, or even whether any of the current techniques are appropriate for a particular problem. Contrast this with supervised learning: No one is surprised if, for a new sueprvised natural language processing problem, the best model is a discriminative margin-based one like perceptron or its more recent cousins. We maintain that this difference between supervised learning and unsupervised adaptation is due at least in part due to our comparative lack of theoretical understanding of the conditions under which domain adaptation can perform well. In this talk we'll examine some recent theoretical results for unsupervised adaptation that are motivated by algorithms and settings from natural language processing. We formalize the (sometimes extremely strong) assumptions under which we can expect unsupervised domain adaptation algorithms to yield high-performing target models, and we examine bounds on the generalization error of source-trained models under these assumptions. These bounds show how "representation learning" approaches to adaptation can achieve good, sometimes great results, and also give some insights on the conditions under which these techniques can break down or even completely fail. At the end of the talk, I will briefly discuss connections to supervised adaptation in terms of these same assumptions and suggest some open questions for new research linking theoretical and practical problems in domain adaptation.