Second Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2011)

CALL FOR PAPERS: Second Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2011)
SPONSORED BY SIGPARSE and the INRIA's ALPAGE PROJECT
IWPT 2011 Collocated Workshop ,  October 6, 2011, Dublin, Ireland


(Last modified October 2, 2011, Proceedings)

OUTLINE 

The aim of this workshop is to bring together researchers interested in parsing languages with richer morphological structures than in English, and to provide a forum for discussing the challenges associated with parsing such languages and sharing strategies towards their solutions. We are interested in presentations relating to actively studied areas of research including the adaptation of existing parsing techniques to new languages, the design of new models that take morphological information into account, the implementation of models that allow robust statistics to be obtained in the face of high word-form variation, and so on.

IMPORTANT DATES

Submission deadline: July 31, 2011  (PDT, GMT-8) . 

Notification to authors: September 5, 2011

Camera ready copy: September 20, 2011

Workshop: October 6, 2011


Proceedings are out!  They're available at the ACL anthology website.




Session 1 (chaired by Djamé Seddah)
13:30 - 13:35 Opening Remarks
13:35 - 13:55 Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing 
Jinho D. Choi and Martha Palmer 
13:55 - 14:15 Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic 
Jon Dehdari, Lamia Tounsi and Josef van Genabith 
14:15 - 14:30  French parsing enhanced with a word clustering method based on a syntactic lexicon 
Anthony Sigogne, Matthieu Constant and Eric Laporte 
14:30 - 14:45 Testing the Effect of Morphological Disambiguation in Dependency Parsing of Basque 
Kepa Bengoetxea, Arantza Casillas and Koldo Gojenola

14:45 - 15:05 Coffee Break

Session 2 (chaired by Jennifer Foster)
15:05 - 15:25 Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar 
Andreas van Cranenburgh, Remko Scha and Federico Sangati 
15:25 - 15:45 Multiword Expressions in Statistical Dependency Parsing 
Gulsen Erygit, Tugay Ilbay and Ozan Arkan Can
15:45 - 16:00 Linguistically Rich Graph Based Data Driven Parsing For Hindi 
Samar Husain, Raghu Pujitha Gade and Rajeev Sangal 
16:00 - 16:15 Data point selection for self-training 
Ines Rehbein

16:15 - 16:25 Short Break

Session 3 (chaired by Reut Tsarfaty)
16:25 - 17:25 Panel (James Hendersen, Joakim Nivre, Slav Petrov, Josef van Genabith, Yannick Versley)
17:25 - 17:30 Closing Remarks  


Long Papers

Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar 
Andreas van Cranenburgh, Remko Scha and Federico Sangati 

Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic 
Jon Dehdari, Lamia Tounsi and Josef van Genabith 

Multiword Expressions in Statistical Dependency Parsing 
Gulsen Erygit, Tugay Ilbay and Ozan Arkan Can

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing 
Jinho D. Choi and Martha Palmer 

Short Papers

Data point selection for self-training 
Ines Rehbein

French parsing enhanced with a word clustering method based on a syntactic lexicon 
Anthony Sigogne, Matthieu Constant and Eric Laporte 

Linguistically Rich Graph Based Data Driven Parsing For Hindi 
Samar Husain, Raghu Pujitha Gade and Rajeev Sangal 

Testing the Effect of Morphological Disambiguation in Dependency Parsing of Basque 
Bengoetxea Kepa, Casillas Arantza and Gojenola Koldo



DISCUSSION PANEL

The main topics of discussion will be oriented towards the objective and the design of a Shared Task involving parsing with non-gold input (e.g. non-gold tokenization, morphology, POS,...). As in our previous panels, we hope for a very animated discussion and of course, we expect many interventions from the general audience.

Josef van Genabith (Dublin City University, Ireland)

James Henderson (Université de Geneve, Switzerland)

Joakim Nivre (Uppsala University, Sweden)

Slav Petrov (Google Research NY, USA)

Reut Tsarfaty (Chair, Uppsala University, Sweden)

Yannick Versley (University of Tübingen, Germany)




Everyone is welcome to attend our workshop. Registration is included in IWPT 2011's registration fees. See  IWPT 2011's registration page for details.

The IWPT organization chair has obtained a reduction with a hotel in Dublin city centre and one near the DCU campus. See the IWPT accommodation page for more information. If you wish to be closer to the city centre and only 2 minutes away from all the main bus routes of O'Connell Street,  the "Jurys Inn Parnell Street" is a good choice (book via hotel.com for cheaper prices, around 70€ with breakfast). Otherwise, there are many reasonably priced B&Bs around Parnell Street and O'Connell Street. 


CALL FOR PAPERS

Since the advent of large syntactically annotated corpora, statistical parsing has been a cornerstone of research in NLP. While Penn Treebank parsing performance, be it dependency-based or constituency-based, seems to have reached a high plateau, the same cannot be said of other languages, data sets and domains.

Statistical parsing of morphologically-rich languages (MRLs) has repeatedly been shown to exhibit a plethora of nontrivial challenges, including sparse lexica in the face of rich inflectional systems, parsing deficiency in the face of free word order and tree- bank annotation idiosyncrasies in the face of morphosyntactic interactions. Recent studies on parsing languages such as German, Arabic, Hebrew or French using newly available treebanks contribute to our understanding of the extent of the difficulty that such phenomena pose when reusing parsing models initially designed to parse English.  Beyond the technical and linguistic difficulties, the lack of communication between researchers working on different MRLs can lead to a reinventing the wheel syndrome.

Following the warm reception of the first SPMRL workshop at NAACL-HLT 2010, the second SPMRL workshop aims to build upon the success of the first and offer a platform to this growing community of interests. We solicit papers describing parsing experiments with models and architectures for languages with morphological structure richer than English, or studies that address the lexical sparseness challenges (for any language). In order to provide a realistic indication of the performance of parsing systems on unstructured and unanalyzed data, we particularly encourage contributions reporting parsing results for non-gold as well as gold morphological analysis of the test data, before or jointly with the parser. 

The areas of interest of the second SPMRL workshop include, but are not limited to, the following topics:
  • parsing models and architectures that explicitly integrate morphological analysis and parsing
  • parsing models and architectures that focus on lexical coverage and the handling of OOV words either by incorporating linguistic knowledge or through the use of unsupervised/semi-supervised learning techniques
  • Cross-language and cross-model comparison of models' strength and weaknesses in the face of particular linguistic phenomena (e.g. morphosyntactic characteristics, degree of word-order freedom ...)
  • comprehensive analyses of the strengths and weaknesses of various parsing models on particular linguistic (e.g. morphosyntactic) phenomena with respect to  variation in tagsets, annotation schemes and additional data transformations


This year's SPMRL workshop will feature a special theme concerning the Shared Task on Statistical Parsing of MRLs. Following the panel discussion in previous events we intend the first shared task on parsing MRLs to take place in 2012 and we now solicit position papers that aim to discuss the goals, scope, design, expected contributions and desired outcomes of such a shared task. The accepted papers will be included in the SPMRL proceedings. In addition we plan to hold a panel discussion in which we expect to discuss a range of relevant topics including (but not limited to) cross-language parse representation, cross-annotation evaluation, MRLs-specific architectural concerns, and so on. 
We aim to use this panel discussion to help us devise a set of clearly defined objectives for the 2012 shared task and to make concrete decisions about practicalities such as scope, size, representation, languages, evaluation, etc.


SUBMISSION

Authors are invited to submit long papers (up to 9 pages + references) and short papers (up to 5 pages + references). Long papers should describe unpublished, substantial and completed research.  Short papers should be position papers, papers describing work in progress or short, focused contributions.
Papers will be accepted until  July 31, 2011, (PDT, GMT-8)   in PDF format via the START system (https://www.softconf.com/c/spmrl2011/ )
Submitted papers must follow the styles and the formating guidelines available from the last ACL-HLT recommendations (http://www.acl2011.org/authors.shtml). As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ..." must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ..." Papers that do not conform to these requirements will be rejected without review. In addition, please do not post your submissions on the web until after the review process is complete.

PROGRAM COMMITTEE

Marie Candito, Jennifer Foster, Yoav Goldberg, Ines Rehbein,  Djamé Seddah, Lamia Tounsi, Reut Tsarfaty, Yannick Versley

REVIEW COMMITTEE


Mohammed Attia (Dublin City University, Ireland)
Bernd Bohnet  (University of Stuttgart, Germany) 
Adriane Boyd (Ohio State University, US)
Marie Candito (University of Paris 7,  France)
Ozlem Cetinoglu (Dublin City University, Ireland)
Grzegorz Chrupala  (Saarland University, Germany)
Benoit Crabbé (University of Paris 7,  France)
Jennifer Foster (Dublin City University, Ireland)
Josef van Genabith (Dublin City University, Ireland)
Yoav Goldberg (Ben Gurion University, Israel)
Spence Green (Stanford University, US)
Deirdre Hogan (Dublin City University, Ireland)
Samar Husain (Inter. Institute of Information Technology, India)
Sandra Kuebler (Indiana University, US)
Jonas Kuhn (University of Stuttgart, Germany)

   

Alberto Lavelli (FBK-irst, Italy)
Joseph Le Roux (Université de la Méditérranée, France) 
Wolfgang Maier (University of Tübingen, Germany)
Yuval Marton (IBM Watson Resarch Center, US)
Takuya Matsuzaki (University of Toyko, Japan)
Yusuke Miyao (University of Toyko, Japan)
Joakim Nivre (Uppsala University, Sweden)
Owen Rambow (Columbia University, US)
Ines Rehbein  (Saarland University, Germany)
Kenji Sagae (University of Southern California, US)
Benoit Sagot (Inria Rocquencourt, France)
Djamé Seddah (University of Paris Sorbonne, France)
Lamia Tounsi (Dublin City University, Ireland)
Reut Tsarfaty (Uppsala University, Sweden)
Yannick Versley (University of Tübingen, Germany)

ORGANIZERS AND CONTACTS

Djamé Seddah, Université Paris-Sorbonne & Alpage project  
Reut Tsarfaty, Uppsala University
Jennifer Foster, Dublin City University
To contact us : spmrl2011@gmail.com

This workshop is sponsored by SIGPARSE  and  by the INRIA's Alpage project.

The first SPMRL event took place in October 2009 at IWPT'09  in the form of a discussion panel which followed 7 presentations on different issues related to statistical parsing of German, Arabic, Modern Hebrew and French.  It was followed  by SPMRL 2010, a NAACL/HLT 2010 workshop which featured 13 presentations with more languages added to the original set (Basque, Hindi and Korean in addition to Arabic, French, German and Modern Hebrew) and a very animated discussion panel. This workshop was the second most successful workshop in terms of registered attendees (more than 50). The SPMRL 2010 proceedings include an overview of the field  in the form of a long preface co-authored by the SPMRL programme committee.


The SPMRL group is editing an upcoming special issue of Computational Linguistics devoted to the parsing of MRLs.  (Expected publication date: first quarter of 2012. Deadline September 30th, 2011)


FAQ

Can I submit the same paper to IWPT and SPMRL?
Yes, double submissions are permitted but obviously the same paper will not be published at both venues. Another possibility for those working on parsing MRLs is submit two different papers: in this scenario, we encourage authors to view the SPMRL workshop as a venue for more detailed analysis papers.