Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing

   (in conjunction with RANLP 2011)

Proceedings are available here.

Program (September 15)

Chair: Chris Biemann

9:55Introductory Remarks
 10:00Gibbs Tree-Sampling in Unsupervised Dependency Parsing
by David Mareček and Zdeněk Žabokrtský
Slides: [pdf]
 10:30-11:00     Coffee break
 11:00Guided Self Training for Sentiment Classification
by Brett Drury, Luis Torgo and Jose Joao Almeida
Slides: [pdf]
 11:30Investigating the Applicability of current Machine-Learning based Subjectivity Detection Algorithms on German Texts
by Malik Atalla, Christian Scheel, Ernesto William De Luca and Sahin Albayrak
Slides: [pdf]
 12:00-14:15       Lunch
 14:15Invited talk: Simple, Effective, Robust Semi-Supervised Learning, Thanks To Google N-grams
by Shane Bergsma
Slides: [pptx] [ppt] [pdf]
 15:15-16:00       Coffee break
Learning Protein Protein Interaction Extraction using Distant Supervision
by Philippe Thomas, Illés Solt, Roman Klinger and Ulf Leser
Slides: [pdf]
 16:30Topic Models with Logical Constraints on Words
by Hayato Kobayashi, Hiromi Wakaki, Tomohiro Yamasaki and Masaru Suzuki
    Closing Remarks

The talk Investigation of Co-training Views and Variations for Semantic Role Labeling by Rasoul Samad Zadeh Kaljahi and Mohd Sapiyan Baba has been cancelled.

Call for Papers

In natural language processing (NLP), supervised learning scenarios are more frequently explored than unsupervised or semi-supervised ones. Unfortunately, labeled data are often highly domain-dependent and short in supply. It has therefore become increasingly important to leverage both labeled and unlabeled data to achieve the best performance in challenging NLP problems that involve learning of structured variables.

Until recently most results in semi-supervised learning of structured variables in NLP were negative (Abney, 2008), but today the best part-of-speech taggers (Suzuki et al., 2008), named entity recognizers (Turian et al., 2010), and dependency parsers (Sagae and Tsujii, 2007; Suzuki et al., 2009; Søgaard and Rishøj, 2010) exploit mixtures of labeled and unlabeled data. Unsupervised and minimally unsupervised NLP also sees rapid growth.

The most commonly used semi-supervised learning algorithms in NLP are feature-based methods (Koo et al., 2008; Sagae and Gordon, 2009; Turian et al., 2010) and EM, self- or co-training (Mihalcea, 2004; Sagae and Tsujii, 2007; Spoustova et al., 2009). Mixture models have also been successfully used (Suzuki and Isozaki, 2008; Suzuki et al., 2009). While feature-based methods seem relatively robust, self-training and co-training are very parameter-sensitive, and parameter tuning has therefore become an important research topic (Goldberg and Zhu, 2009). This is not only a concern in NLP, but also in other areas such as face recognition, e.g. Yan and Wang (2009). Parameter-sensitivity is even more dramatic in unsupervised learning of structured variables, e.g. unsupervised part-of-speech tagging and grammar induction.

By more robust unsupervised or semi-supervised learning algorithms we mean algorithms with few parameters that give good results across different data sets and different applications.

Specifically, we encourage submissions on the following topics:
  • assessing robustness of known or new unsupervised or semi-supervised methods across different NLP problems or languages
  • new unsupervised or semi-supervised methods for NLP problems
  • positive and negative results on using of unsupervised or semi-supervised methods in applications
  • application-oriented evaluation of unsupervised or semi-supervised methods
  • comparison and combination of unsupervised or semi-supervised methods

This workshop aims to bring together researchers dedicated to designing and evaluating robust unsupervised or semi-supervised learning algorithms for NLP problems. This includes, but is not limited to POS tagging, grammar induction and parsing, named entity recognition, word sense induction and disambiguation, machine translation, sentiment analysis and taxonomy learning. Our goal is to evaluate known unsupervised and semi-supervised learning algorithms, foster novel and more robust ones and discuss positive and negative results that may otherwise not appear in a technical paper at a major conference. We welcome submissions that address the robustness of unsupervised or semi-supervised learning algorithms for NLP, and especially encourage authors to provide results for different data sets, languages or applications.

Steven Abney. 2008. Semi-supervised learning for computational linguistics. Chapman & Hall.
Andrew Goldberg and Jerry Zhu. 2009.  Keepin' it real: semi-supervised learning with realistic tuning. In NAACL.
Terry Koo et al. 2008. Simple semi-supervised dependency parsing. In ACL-HLT.
Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In CoNLL.
Kenji Sagae and Jun'ichi Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In CoNLL Shared Task.
Kenji Sagae and Andrew Gordon. 2009. Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures. In IWPT.
Drahomira Spoustova et al., 2009. Semi-supervised training for the averaged perceptron POS tagger. In EACL.
Jun Suzuki and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using giga-word scale unlabeled data. In ACL-HLT.
Jun Suzuki et al. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In EMNLP.
Anders Søgaard and Christian Rishøj. 2010. Semi-supervised dependency parsing using generalized tri-training.
Joseph Turian et al. 2010. Word representations: a simple and general method for semi-supervised learning. In ACL.
Shuicheng Yan and Huan Wang. 2009. Semi-supervised learning by sparse representation. In SIAM Data Mining.

Chris Biemann, TU Darmstadt
Anders Søgaard, University of Copenhagen

Important dates:
Submission deadline: July 15 2011.
Notification: August 15 2011.
Workshop: September 15 2011.

Submission guidelines:
Use the RANLP style sheets found here.
We invite long (8) and short (4) papers. All papers will appear in the ACL Anthology. (Accepted short papers will be presented either as short oral presentations or as posters.)
Submission page: 

Program committee:
Steven Abney, University of Michigan
Stefan Bordag, ExB Research & Development
Eugenie Giesbrecht, FZI Karlsruhe
Katja Filippova, Google
Florian Holz, University of Leipzig
Jonas Kuhn, University of Stuttgart
Vivi Nastase, HITS Heidelberg
Reinhard Rapp, JG University of Mainz
Lucia Specia, University of Wolverhampton
Valentin Spitkovsky, Stanford University
Sven Teresniak, University of Leipzig
Dekai Wu, HKUST
Torsten Zesch, TU Darmstadt
Jerry Zhu, University of Wisconsin-Madison