(in conjunction with RANLP 2011)
Proceedings are available here.
Program (September 15)
Chair: Chris Biemann
The talk Investigation of Co-training Views and Variations for Semantic Role Labeling by Rasoul Samad Zadeh Kaljahi and Mohd Sapiyan Baba has been cancelled.
Call for Papers
In natural language processing (NLP), supervised learning scenarios are more frequently explored than unsupervised or semi-supervised ones. Unfortunately, labeled data are often highly domain-dependent and short in supply. It has therefore become increasingly important to leverage both labeled and unlabeled data to achieve the best performance in challenging NLP problems that involve learning of structured variables.
Until recently most results in semi-supervised learning of structured variables in NLP were negative (Abney, 2008), but today the best part-of-speech taggers (Suzuki et al., 2008), named entity recognizers (Turian et al., 2010), and dependency parsers (Sagae and Tsujii, 2007; Suzuki et al., 2009; Søgaard and Rishøj, 2010) exploit mixtures of labeled and unlabeled data. Unsupervised and minimally unsupervised NLP also sees rapid growth.
The most commonly used semi-supervised learning algorithms in NLP are feature-based methods (Koo et al., 2008; Sagae and Gordon, 2009; Turian et al., 2010) and EM, self- or co-training (Mihalcea, 2004; Sagae and Tsujii, 2007; Spoustova et al., 2009). Mixture models have also been successfully used (Suzuki and Isozaki, 2008; Suzuki et al., 2009). While feature-based methods seem relatively robust, self-training and co-training are very parameter-sensitive, and parameter tuning has therefore become an important research topic (Goldberg and Zhu, 2009). This is not only a concern in NLP, but also in other areas such as face recognition, e.g. Yan and Wang (2009). Parameter-sensitivity is even more dramatic in unsupervised learning of structured variables, e.g. unsupervised part-of-speech tagging and grammar induction.
By more robust unsupervised or semi-supervised learning algorithms we mean algorithms with few parameters that give good results across different data sets and different applications.
Specifically, we encourage submissions on the following topics:
This workshop aims to bring together researchers dedicated to designing and evaluating robust unsupervised or semi-supervised learning algorithms for NLP problems. This includes, but is not limited to POS tagging, grammar induction and parsing, named entity recognition, word sense induction and disambiguation, machine translation, sentiment analysis and taxonomy learning. Our goal is to evaluate known unsupervised and semi-supervised learning algorithms, foster novel and more robust ones and discuss positive and negative results that may otherwise not appear in a technical paper at a major conference. We welcome submissions that address the robustness of unsupervised or semi-supervised learning algorithms for NLP, and especially encourage authors to provide results for different data sets, languages or applications.
Steven Abney. 2008. Semi-supervised learning for computational linguistics. Chapman & Hall.
Andrew Goldberg and Jerry Zhu. 2009. Keepin' it real: semi-supervised learning with realistic tuning. In NAACL.
Terry Koo et al. 2008. Simple semi-supervised dependency parsing. In ACL-HLT.
Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In CoNLL.
Kenji Sagae and Jun'ichi Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In CoNLL Shared Task.
Kenji Sagae and Andrew Gordon. 2009. Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures. In IWPT.
Drahomira Spoustova et al., 2009. Semi-supervised training for the averaged perceptron POS tagger. In EACL.
Jun Suzuki and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In ACL-HLT.
Jun Suzuki et al. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In EMNLP.
Anders Søgaard and Christian Rishøj. 2010. Semi-supervised dependency parsing using generalized tri-training.
Joseph Turian et al. 2010. Word representations: a simple and general method for semi-supervised learning. In ACL.
Chris Biemann, TU Darmstadt
Anders Søgaard, University of Copenhagen
Submission deadline: July 15 2011.
Notification: August 15 2011.
Workshop: September 15 2011.
Use the RANLP style sheets found here.
We invite long (8) and short (4) papers. All papers will appear in the ACL Anthology. (Accepted short papers will be presented either as short oral presentations or as posters.)
Steven Abney, University of Michigan
Stefan Bordag, ExB Research & Development
Eugenie Giesbrecht, FZI Karlsruhe
Katja Filippova, Google
Florian Holz, University of Leipzig
Jonas Kuhn, University of Stuttgart
Vivi Nastase, HITS Heidelberg
Reinhard Rapp, JG University of Mainz
Lucia Specia, University of Wolverhampton
Valentin Spitkovsky, Stanford University
Sven Teresniak, University of Leipzig
Dekai Wu, HKUST
Torsten Zesch, TU Darmstadt
Jerry Zhu, University of Wisconsin-Madison