In recent years, there has been an increased interest in minimizing the need for annotated data in NLP. Significant progress has been made in the development of both semi-supervised and unsupervised learning approaches. Semi-supervised approaches are already showing remarkable empirical success, with models that exploit mixtures of labeled and unlabeled data obtaining best results in several tasks. Examples include part-of-speech tagging (Suzuki et al., 2008), named entity recognition (Turian et al., 2010), and dependency parsers (Sagae and Tsujii, 2007; Suzuki et al., 2009; Søgaard and Rishøj, 2010). Although unsupervised approaches have proved more challenging than semi-supervised ones, their further development is particularly important because they carry the highest potential in terms of avoiding the annotation cost. Such approaches can be applied to any language or genre for which adequate raw text resources are available.
This workshop aims to bring together researchers dedicated to designing and evaluating unsupervised and semi-supervised learning algorithms for NLP problems. The workshop will accept submissions in any topic related to unsupervised and semi-supervised learning. However, specific focus will be given to two special themes: robust algorithms and explorations of the continuum from unsupervised to semi-supervised learning. Robust Algorithms: By more robust unsupervised or semi-supervised learning algorithms we mean algorithms with few parameters that give good results across different data sets and/or different applications. Many algorithms including EM, self-training and co-training are very parameter-sensitive, and parameter tuning has therefore become an important research topic (Goldberg and Zhu, 2009). We explicitly encourage submissions that present robust algorithms or evaluate the robustness of known algorithms. The Continuum from Unsupervised to Semi-Supervised Learning: The distinction between unsupervised and semi-supervised learning approaches is often not very clear, and we explicitly encourage submissions about grey-zone approaches such as weak and indirect supervision, learning from nearly free annotations (e.g. html mark-up), joint learning from several modalities, cross-language adaptation, and learning with knowledge-based priors or posteriors.