Natural Language Processing (NLP) is being revolutionized by deep learning with neural networks. However, deep learning requires large amounts of annotated data, and its advantage over traditional statistical methods typically diminishes when such data is not available; for example, statistical machine translation continues to outperform neural machine translation in many bilingually resource-poor scenarios. Large amounts of annotated data do not exist for many low-resource languages. Even in high-resource languages, it can be difficult to find linguistically annotated data of sufficient size and quality to allow neural methods to excel. This workshop aims to bring together researchers from the NLP and machine learning communities who work on deep learning when there is not enough data for those methods to succeed out-of-the-box. Techniques may include self-training, paired training, distant supervision, domain adaptation, semi-supervised and transfer learning, as well as human-in-the-loop techniques such as active learning.
One class of approaches tackles the scarcity of fully-annotated data by using weakly-annotated data. In general, weakly-annotated data is labeled data that has some deviation from data appropriate for supervised training. For example, several methods for relation extraction use Freebase as a source of distant supervision. A source of abundant weakly-annotated data for some NLP applications is user feedback, which is typically partial, discrete, and noisy. Learning from such data is very challenging and also useful for the NLP community. Bandit learning and counterfactual learning from bandit feedback aim to leverage online and offline user feedback to improve model training. An alternative to online user feedback is to drastically reduce the amount of annotated data but increase its quality. Active learning algorithms incrementally select examples for labeling, with the aim of minimizing the amount of labeled data required to reach a given level of performance.
Another class of approaches uses supervision for related tasks to improve the performance on the main task. For example, one may not have access to a large amount of bilingual parallel data to train a neural machine translator for a given language pair, but may have access to monolingual data in the source and target language, bilingual parallel data for other language pairs, or annotated data for other tasks such as named entity recognition or parsing. Techniques to leverage these additional data resources include transfer learning and multi-task learning, where we care about learning for the target task or all tasks, respectively. This class of approaches also includes domain adaptation, where the task is fixed but annotated training data is drawn from a domain that differs from the test domain. Generative adversarial networks (GANs) can be used to learn representations that are robust to domain shift. Lastly, semi-supervised learning methods deal with learning from annotated and unannotated data, by creating better representations or by regularizing model parameters.
Please submit your paper using START.
Format: Submissions must be in PDF format, anonymized for review, written in English and follow the EMNLP 2019 formatting requirements, available here. We strongly advise you use the LaTeX template files provided by EMNLP 2019.
Length: Submissions consist of up to eight pages of content. There is no limit on the number of pages for references. There is no extra space for appendices. Accepted papers will be given one additional page for content. There is no explicit short paper track, but you should feel free to submit your paper regardless of its length. Reviewers will be instructed not to penalize papers for being too short.
Publishing: Authors can also submit non-archival papers of up to eight pages of content. Non-archival papers will not be included in the proceedings. Thus, your work will retain the status of being unpublished and later submission at another venue (e.g., a journal) is not precluded. Likewise, you are free to re-present work that has been previously published elsewhere. We do NOT require submissions to follow an anonymity period.
Dual Submission: Authors can make submissions that are also under review at other venues, provided it does not violate the policy at those venues.
Presentation Format: We anticipate most papers, both archival and non-archival, will be presented as posters, with only a few selected for oral presentation.