In recent years, there has been an increased interest in minimizing the need for annotated data in NLP. Significant progress has been made in the development of both semi-supervised and unsupervised learning approaches. Although unsupervised approaches have proved more challenging than semi-supervised ones, their further development is particularly important because they carry the highest potential in terms of avoiding annotation cost.
Such approaches can be applied to any language or genre for which adequate raw text resources are available. They also bear theoretical promise for their ability to recover novel, valuable information in textual data and to expose underlying relations between form and various linguistic phenomena. Largely due to these benefits, NLP has recently experienced a surge of interest in unsupervised learning techniques. Increasingly sophisticated approaches have been proposed and applied to a wide range of tasks, including parsing, verb clustering, induction of grammatical categories, lexical semantics, POS tagging, and many others.
The aim of this workshop is to bring together researchers working on different areas of unsupervised language learning. The objective is to summarize what has been achieved in the topic, to foster discussions on current problems in the area, and to discuss future trends.
We welcome papers of between 1 and 9 pages and in some cases of abstracts as well (see below for exact submission specifications) in any area or aspect of unsupervised learning in NLP (e.g., techniques, tasks, applications, high level issues that call for discussion), and particularly encourage submissions which focus on the current challenges in the development and evaluation of fully unsupervised approaches. For example:
Over the last decade several unsupervised techniques were developed and applied to NLP (e.g. Bayesian, approximate inference, graph-based methods, and others). However, recent methods do not always perform better than the more traditional clustering and pattern recognition algorithms. Discussion on the contribution of various unsupervised methods to NLP would be highly valuable.
A fundamental aim of unsupervised learning in NLP is to devise language-independent learning mechanisms. However, languages differ greatly from one another. What is the best way to handle language specificity in multilingual unsupervised learning?
A prominent advantage of unsupervised learning is its ability to induce novel information from data (e.g. new linguistic knowledge, annotation schemes, etc). How should this information be evaluated? Is direct evaluation against an existing gold standard a good approach? Would it be better to opt for intrinsic (i.e., "gold-standard independent") evaluation? Or is evaluation in the context of another NLP task or application ideal? We welcome discussion on the pros and cons of each method, along with novel ideas for evaluation.
The ultimate goal of unsupervised learning is to use it to aid NLP. How should this be done, and what kind of challenges do we face when aiming to integrate unsupervised techniques in various application tasks?
Three types of submissions will be accepted: (1) technical papers, (2) position papers (perspectives/speculation) and (3) survey papers (work done on a specific task/in a certain sub-field over a few years). We especially encourage submission of position and survey papers and abstracts.
Format requirements are the same as for full papers of EMNLP 2011, see http://conferences.inf.ed.ac.uk/emnlp2011/call.html for detailed description and style files. Submission will be electronic, using the workshop's submission webpage at START. We accept papers in all three types of submission. Paper length is limited to 9 pages of content and any number of additional pages with references only. We welcome papers of length considerably smaller than the page limit. Such papers will neither be favored nor disfavored in the review process.
We also welcome the submission of abstracts (up to one page) of position and survey papers (but not of technical papers). Abstracts should be formatted using the EMNLP style files and submitted as a separate file (just like short and long papers). As with shorter papers, abstracts will neither be favored nor disfavored by the program committee.
The workshop allows multiply submissions. However, we kindly request to be notified by email if your work was also submitted to another venue.
Please submit here
The reviewing of the papers will be blind. The paper should not include the authors' names and affiliations. Furthermore, self citations and other references (e.g. to projects, corpora, or software) that could reveal the author's identity should be avoided. For example, instead of "We previously showed (Smith, 1991)...", write "Smith previously showed (Smith, 1991) ...".
Please note: the dates were changed considerably
|May 26, 2011||Due date for submissions|
|June 17, 2011||Notification of acceptance|
|July 1, 2011||Deadline for final camera-ready version|
|July 30, 2011 ||Workshop held|
Omri Abend (Hebrew University of Jerusalem, firstname.lastname@example.org)
Anna Korhonen (University of Cambridge, email@example.com)
Ari Rappoport (Hebrew University of Jerusalem, firstname.lastname@example.org)
Roi Reichart (MIT, email@example.com)
Program Committee Members:
1. Eneko Agirre (University of the Basque Country, Spain)
2. Jason Baldridge (University of Texas at Austin, USA)
3. Tim Baldwin (University of Melbourne, Australia)
4. Sam Brody (Columbia University, USA)
5. Alexander Clark (Royal Holloway, University of London, UK)
6. Shay Cohen (Carnegie Mellon University, USA)
7. Mona Diab (Columbia University, USA)
8. Gregory Druck (University of Massachusetts Amherst, USA)
9. Jason Eisner (Johns Hopkins University, USA)
10. Sharon Goldwater (University of Edinburgh, UK)
11. Joao Graca (University of Pennsylvania, USA)
12. Ioannis Klapaftis (University of York, UK)
13. Lillian Lee (Cornell University, USA)
14. Percy Liang (UC Berkeley, USA)
15. Diana McCarthy (Lexical Computing, Ltd., UK)
16. Preslav Nakov (National University of Singapore, Singapore)
17. Roberto Navigli (University of Rome, Italy)
18. Vincent Ng (UT Dallas, USA)
19. Ted Pedersen (University of Minnesota, USA)
20. Andrew Rosenberg (CUNY, USA)
21. Valentin Spitkovsky (Stanford University, USA)
22. Carlo Strapparava (FBK-irst, Italy)
23. Ben Taskar (University of Pennsylvania, USA)
24. Kristina Toutanova (Microsoft Research, USA)
25. Andreas Vlachos (University of Wisconsin-Madison, USA)