CALL FOR PAPERS: Second Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2011)
SPONSORED BY SIGPARSE and the INRIA's ALPAGE PROJECT
IWPT 2011 Collocated Workshop , October 6, 2011, Dublin, Ireland
The aim of this workshop is to bring together researchers interested in parsing languages with richer morphological structures than in English, and to provide a forum for discussing the challenges associated with parsing such languages and sharing strategies towards their solutions. We are interested in presentations relating to actively studied areas of research including the adaptation of existing parsing techniques to new languages, the design of new models that take morphological information into account, the implementation of models that allow robust statistics to be obtained in the face of high word-form variation, and so on.
Camera ready copy: September 20, 2011Workshop: October 6, 2011
Proceedings are out! They're available at the ACL anthology website.
Session 1 (chaired by Djamé Seddah)
13:30 - 13:35 Opening Remarks
13:35 - 13:55 Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic ParsingJinho D. Choi and Martha Palmer
13:55 - 14:15 Morphological Features for Parsing Morphologically-rich Languages: A Case of ArabicJon Dehdari, Lamia Tounsi and Josef van Genabith
14:15 - 14:30 French parsing enhanced with a word clustering method based on a syntactic lexiconAnthony Sigogne, Matthieu Constant and Eric Laporte
14:30 - 14:45 Testing the Effect of Morphological Disambiguation in Dependency Parsing of BasqueKepa Bengoetxea, Arantza Casillas and Koldo Gojenola
14:45 - 15:05 Coffee Break
Session 2 (chaired by Jennifer Foster)
15:05 - 15:25 Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammarAndreas van Cranenburgh, Remko Scha and Federico Sangati
15:25 - 15:45 Multiword Expressions in Statistical Dependency ParsingGulsen Erygit, Tugay Ilbay and Ozan Arkan Can
15:45 - 16:00 Linguistically Rich Graph Based Data Driven Parsing For HindiSamar Husain, Raghu Pujitha Gade and Rajeev Sangal
16:00 - 16:15 Data point selection for self-trainingInes Rehbein
16:15 - 16:25 Short Break
Session 3 (chaired by Reut Tsarfaty)
16:25 - 17:25 Panel (James Hendersen, Joakim Nivre, Slav Petrov, Josef van Genabith, Yannick Versley)
17:25 - 17:30 Closing Remarks
Discontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
Andreas van Cranenburgh, Remko Scha and Federico Sangati
Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic
Jon Dehdari, Lamia Tounsi and Josef van Genabith
Multiword Expressions in Statistical Dependency Parsing
Gulsen Erygit, Tugay Ilbay and Ozan Arkan Can
Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing
Jinho D. Choi and Martha Palmer
Data point selection for self-training
French parsing enhanced with a word clustering method based on a syntactic lexicon
Anthony Sigogne, Matthieu Constant and Eric Laporte
Linguistically Rich Graph Based Data Driven Parsing For Hindi
Samar Husain, Raghu Pujitha Gade and Rajeev Sangal
Bengoetxea Kepa, Casillas Arantza and Gojenola Koldo
The main topics of discussion will be oriented towards the objective and the design of a Shared Task involving parsing with non-gold input (e.g. non-gold tokenization, morphology, POS,...). As in our previous panels, we hope for a very animated discussion and of course, we expect many interventions from the general audience.
Josef van Genabith (Dublin City University, Ireland)
James Henderson (Université de Geneve, Switzerland)
Slav Petrov (Google Research NY, USA)
Reut Tsarfaty (Chair, Uppsala University, Sweden)
Everyone is welcome to attend our workshop. Registration is included in IWPT 2011's registration fees. See IWPT 2011's registration page for details.
The IWPT organization chair has obtained a reduction with a hotel in Dublin city centre and one near the DCU campus. See the IWPT accommodation page for more information. If you wish to be closer to the city centre and only 2 minutes away from all the main bus routes of O'Connell Street, the "Jurys Inn Parnell Street" is a good choice (book via hotel.com for cheaper prices, around 70€ with breakfast). Otherwise, there are many reasonably priced B&Bs around Parnell Street and O'Connell Street.
Since the advent of large syntactically annotated corpora, statistical parsing has been a cornerstone of research in NLP. While Penn Treebank parsing performance, be it dependency-based or constituency-based, seems to have reached a high plateau, the same cannot be said of other languages, data sets and domains.
Statistical parsing of morphologically-rich languages (MRLs) has repeatedly been shown to exhibit a plethora of nontrivial challenges, including sparse lexica in the face of rich inflectional systems, parsing deficiency in the face of free word order and tree- bank annotation idiosyncrasies in the face of morphosyntactic interactions. Recent studies on parsing languages such as German, Arabic, Hebrew or French using newly available treebanks contribute to our understanding of the extent of the difficulty that such phenomena pose when reusing parsing models initially designed to parse English. Beyond the technical and linguistic difficulties, the lack of communication between researchers working on different MRLs can lead to a reinventing the wheel syndrome.
Following the warm reception of the first SPMRL workshop at NAACL-HLT 2010, the second SPMRL workshop aims to build upon the success of the first and offer a platform to this growing community of interests. We solicit papers describing parsing experiments with models and architectures for languages with morphological structure richer than English, or studies that address the lexical sparseness challenges (for any language). In order to provide a realistic indication of the performance of parsing systems on unstructured and unanalyzed data, we particularly encourage contributions reporting parsing results for non-gold as well as gold morphological analysis of the test data, before or jointly with the parser.
The areas of interest of the second SPMRL workshop include, but are not limited to, the following topics:
This year's SPMRL workshop will feature a special theme concerning the Shared Task on Statistical Parsing of MRLs. Following the panel discussion in previous events we intend the first shared task on parsing MRLs to take place in 2012 and we now solicit position papers that aim to discuss the goals, scope, design, expected contributions and desired outcomes of such a shared task. The accepted papers will be included in the SPMRL proceedings. In addition we plan to hold a panel discussion in which we expect to discuss a range of relevant topics including (but not limited to) cross-language parse representation, cross-annotation evaluation, MRLs-specific architectural concerns, and so on.
Authors are invited to submit long papers (up to 9 pages + references) and short papers (up to 5 pages + references). Long papers should describe unpublished, substantial and completed research. Short papers should be position papers, papers describing work in progress or short, focused contributions.
Papers will be accepted until July 31, 2011, (PDT, GMT-8) in PDF format via the START system (https://www.softconf.com/c/spmrl2011/ )
Submitted papers must follow the styles and the formating guidelines available from the last ACL-HLT recommendations (http://www.acl2011.org/authors.shtml). As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ..." must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ..." Papers that do not conform to these requirements will be rejected without review. In addition, please do not post your submissions on the web until after the review process is complete.
Reut Tsarfaty, Uppsala University
Jennifer Foster, Dublin City University
To contact us : email@example.com
The first SPMRL event took place in October 2009 at IWPT'09 in the form of a discussion panel which followed 7 presentations on different issues related to statistical parsing of German, Arabic, Modern Hebrew and French. It was followed by SPMRL 2010, a NAACL/HLT 2010 workshop which featured 13 presentations with more languages added to the original set (Basque, Hindi and Korean in addition to Arabic, French, German and Modern Hebrew) and a very animated discussion panel. This workshop was the second most successful workshop in terms of registered attendees (more than 50). The SPMRL 2010 proceedings include an overview of the field in the form of a long preface co-authored by the SPMRL programme committee.
The SPMRL group is editing an upcoming special issue of Computational Linguistics devoted to the parsing of MRLs. (Expected publication date: first quarter of 2012. Deadline September 30th, 2011)
Yes, double submissions are permitted but obviously the same paper will not be published at both venues. Another possibility for those working on parsing MRLs is submit two different papers: in this scenario, we encourage authors to view the SPMRL workshop as a venue for more detailed analysis papers.