In recent years, there has been an increased interest in minimizing the need for annotated data in NLP. Significant progress has been made in the development of both semi-supervised and unsupervised learning approaches. Semi-supervised approaches are already showing remarkable empirical success, with models that exploit mixtures of labeled and unlabeled data obtaining best results in several tasks. Examples include part-of-speech tagging (Suzuki et al., 2008), named entity recognition (Turian et al., 2010), and dependency parsers (Sagae and Tsujii, 2007; Suzuki et al., 2009; Søgaard and Rishøj, 2010). Although unsupervised approaches have proved more challenging than semi-supervised ones, their further development is particularly important because they carry the highest potential in terms of avoiding the annotation cost. Such approaches can be applied to any language or genre for which adequate raw text resources are available.
This workshop aims to bring together researchers dedicated to designing and evaluating unsupervised and semi-supervised learning algorithms for NLP problems. The workshop will accept submissions in any topic related to unsupervised and semi-supervised learning. However, specific focus will be given to two special themes: robust algorithms and explorations of the continuum from unsupervised to semi-supervised learning.
By more robust unsupervised or semi-supervised learning algorithms we mean algorithms with few parameters that give good results across different data sets and/or different applications. Many algorithms including EM, self-training and co-training are very parameter-sensitive, and parameter tuning has therefore become an important research topic (Goldberg and Zhu, 2009). We explicitly encourage submissions that present robust algorithms or evaluate the robustness of known algorithms.
The Continuum from Unsupervised to Semi-Supervised Learning:
The distinction between unsupervised and semi-supervised learning approaches is often not very clear, and we explicitly encourage submissions about grey-zone approaches such as weak and indirect supervision, learning from nearly free annotations (e.g. html mark-up), joint learning from several modalities, cross-language adaptation, and learning with knowledge-based priors or posteriors.
Three types of submissions will be accepted: (1) technical papers, (2) position papers (perspectives/speculation) and (3) survey papers (work done on a specific task/in a certain sub-field over a few years). Technical papers can be either long (8+1) or short (4+1), position papers should be short (4+1) and survey papers either long (8+1) or short (4+1).
The workshop allows multiple submissions. However, we kindly request to be notified by email if your work was also submitted to another venue.
Please submit your paper here.
The reviewing of the papers will be double-blind.
Feb 3, 2012 Papers due date
Mar 02, 2012 Notification of acceptance
Mar 09, 2012 Camera-ready deadline
Apr 24, 2012 Workshop held
Omri Abend (Hebrew University of Jerusalem, firstname.lastname@example.org)
Chris Biemann (TU Darmstadt, email@example.com)
Anna Korhonen (University of Cambridge, firstname.lastname@example.org)
Ari Rappoport (Hebrew University of Jerusalem, email@example.com)
Roi Reichart (MIT, firstname.lastname@example.org)
Anders Søgaard (University of Copenhagen, email@example.com)
Steven Abney (University of Michigan, USA)
Jason Baldridge (University of Texas at Austin, USA)
Phil Blunsom (Oxford University, UK)
Stefan Bordag (ExB Research & Development)
Sam Brody (Rutgers University, USA)
Alexander Clark (Royal Holloway, University of London, UK)
Shay Cohen (Columbia University, USA)
Trevor Cohn (University of Sheffield, UK)
Gregory Druck (University of Massachusetts Amherst, USA)
Eugenie Giesbrecht (FZI Karlsruhe)
Joao Graca (University of Pennsylvania, USA)
Florian Holz (University of Leipzig)
Jonas Kuhn (University of Stuttgart)
Percy Liang (University of Stanford, USA)
Suresh Manandhar (University of York, UK)
Diana McCarthy (Lexical Computing, Ltd., UK)
Preslav Nakov (National University of Singapore, Singapore)
Roberto Navigli (University of Rome, Italy)
Vincent Ng (UT Dallas, USA)
Andreas Vlachos (University of Cambridge, UK)
Reinhard Rapp (JG University of Mainz)
Andrew Rosenberg (CUNY, USA)
Sabine Schulte im Walde (University of Stuttgart, Germany)
Noah A. Smith (CMU, USA)
Valentin I. Spitkovsky (University of Stanford, USA)
Torsten Zesch (TU Darmstadt)
Andrew Goldberg and Jerry Zhu. 2009. Keepin' it real: semi-supervised learning with realistic tuning. In NAACL.
Kenji Sagae and Jun'ichi Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In CoNLL Shared Task.
Jun Suzuki and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In ACL-HLT.
Jun Suzuki et al. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In EMNLP.
Anders Søgaard and Christian Rishøj. 2010. Semi-supervised dependency parsing using generalized tri-training.
Joseph Turian et al. 2010. Word representations: a simple and general method for semi-supervised learning. In ACL.