Natural Language Processing is being revolutionized by deep learning with neural networks. However, deep learning requires large amounts of annotated data, and its advantage over traditional statistical methods typically diminishes when such data is not available; for example,  SMT continues to outperform NMT in many bilingually resource-poor scenarios. Large amounts of annotated data do not exist for for many low-resource languages, and for high-resource languages it can be difficult to find linguistically annotated data of sufficient size and quality to allow neural methods to excel. This workshop aims to bring together researchers from the NLP and ML communities who work on learning with neural methods when there is not enough data for those methods to succeed out-of-the-box. Techniques may include self-training, paired training, distant supervision, semi-supervised and transfer learning, and human-in-the-loop techniques such as active learning.

One class of approaches tackles the scarcity of fully annotated data by using weakly-annotated data. In general, weakly-annotated data is labeled data that has some deviation from the appropriate data for supervised training. Distant supervision for relation extraction is a good example of such methods, in which Freebase is used to train the model.  A source of abundant weakly-annotated data for some NLP applications is user feedback, which is typically partial, discrete, and noisy.  Learning from such data is very challenging and also useful for the NLP community.  Bandit learning and counterfactual learning from Bandit feedback aim to leverage online and offline user feedback to improve model training. An alternative to online user feedback is to drastically reduce the amount of annotated data but increase its quality. Active learning algorithms incrementally select examples for labeling, with the aim of minimizing the amount of labeled data required to reach a given level of performance.

Another class of approaches improves the performance on the main task using data from related tasks. For example, one may not have access to a large amount of bilingual parallel data to train neural MT for a given language pair, but may have access to monolingual data in the source and target language, bilingual parallel data for other language pairs, or annotated data for other tasks such as named entity recognition or parsing. Techniques to leverage these additional data resources include transfer learning and multi-task learning, where we care about learning for the target task or all tasks, respectively. This class of approaches also includes domain adaptation, where the task is fixed but annotated training data is drawn from a domain that differs from the test domain. Generative adversarial networks (GANs) can be used to learn representations that are robust to domain shift. Lastly, semi-supervised learning methods deal with learning from annotated and unannotated data, by creating better representations for text or by regularising model parameters.

To sum up, topics of interest include, but are not limited to, the following:

  • Active learning

  • Transfer learning

  • Multi-task learning

  • Semi-supervised learning

  • Dual learning

  • Unsupervised learning

  • Bandit learning

  • Domain adaptation

  • Decipherment or zero-shot learning

  • Language projections

  • Universal representations and interlinguas

  • Low resource structured prediction

The workshop will bring together experts in deep learning and natural language processing whose research focuses on learning with scarce data. Specifically, it will provide attendees with an overview of existing approaches from various disciplines, and enable them to distill principles that can be more generally applicable. We will also discuss the main challenges arising in this setting and outline potential directions for future progress. The target audience consists of researchers and practitioners in related areas.