First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010)

CALL FOR PAPER: First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010)

SPONSORED BY SIGPARSE and the INRIA's ALPAGE PROJECT

NAACL-HLT 2010 Workshop , June 5, 2010, Los Angeles, CA

(Last modified Mai 7, 2011, Link to SPMRL 2011)

!!NEWS!!

SPMRL 2011 is out : Deadline July 31st, 2011 !!

New website : https://sites.google.com/site/spmrl2011/

It will be collocated with IWPT 2011, October 6.

SHARED TASK's MAILING LIST

To host our ongoing discussion on a forthcoming shared task for MRLs parsing,

there's now a mailing list: mrlp-sharedtask@inria.fr

to subscribe: https://sympa-roc.inria.fr/wws/info/mrlp-sharedtask

PROCEEDINGS

Our proceedings are now online, click on the picture below for full pdf. Also available on the ACL Anthology.

OUTLINE

The aim of this workshop is to bring together researchers interested in parsing languages with richer morphological structures than in English, and to provide a forum for discussing the challenges associated with parsing such languages and sharing strategies towards their solutions. We are interested in presentations relating to actively studied areas of research including the adaptation of existing parsing techniques to new languages, the design of new models that take morphological information into account, the implementation of models that allow robust statistics to be obtained in the face of high word-form variation, and so on.

IMPORTANT DATES

Submission deadline: March 1, 2010 March 12, 2010 (PDT, GMT-8) .

Notification to authors: March 30, 2010 Sunday, April 4th

Camera ready copy: April 12, 2010 April 15, 2010

Workshop: June 5, 2010

PRESENTATION

The availability of large syntactically annotated corpora led to an explosion of interest in statistical parsing methods, and to the development of successful models for parsing English using the Wall Street Journal Penn Treebank (PTB, Marcus et al, 1993). In recent years, parsing performance on the PTB has reached a performance ceiling of 90-92% f-score using the Parseval evaluation metrics (Black et al, 1991). When adapted to other language/treebank pairs (such as German, Hebrew, Arabic, Italian or French), these models have been shown to be considerably less successful.

Among the arguments that have been proposed to explain this performance gap are the impact of small training data size, differences in treebank annotation schemes, inadequacy of evaluation metrics, as well as linguistic factors such as the degree of word order freedom and the use of morphological information in the parser. None of these arguments in isolation can account for the systematic performance deterioration, but observed from a wider, cross-linguistic perspective, a picture begins to emerge -- the morphologically rich nature of some of the languages makes them inherently more susceptible to such performance degradation.

Morphologically rich languages (MRLs) are particularly challenging for the application of algorithms primarily designed to parse English. These algorithms focus on learning word order but they often do not take morphological information into account. Another typical problem associated with parsing MRLs is increased lexical data sparseness due to high morphological variation in surface forms. In a more general setup, this problem is akin to handling out-of-vocabulary or rare words for robust statistical parsing and techniques for domain adaptation via lexicon enhancement (also explored for English and less morphologically rich languages).

As well as technical and linguistic difficulties, lack of communication between researchers working on different MRLs can lead to a reinventing the wheel syndrome; the prominence of English parsing in the literature reduces the visibility of research aiming to solve the problems particular to MRLs. By offering a platform to this growing community of interests we hope to overcome this potential cultural obstacle.

We solicit papers describing parsing experiments with models and architectures for languages with morphological structure richer than English, or studies that address the lexical sparseness challenges (for any language). The workshop's areas of interest include, but are not limited to, the following list of topics:

- parsing models and architectures that explicitly integrate morphological analysis and parsing
- parsing models and architectures that focus on lexical coverage and the handling of OOV words either by incorporating linguistic knowledge or through the use of unsupervised/semi-supervised learning techniques
- Cross-language and cross-model comparison of models' strength and weaknesses in the face of particular linguistic phenomena (e.g. morphosyntactic characteristics, degree of word-order freedom ...)
- comprehensive analyses of the strengths and weaknesses of various parsing models on particular linguistic (e.g. morphosyntactic) phenomena with respect to variation in tagsets, annotation schemes and additional data transformations

ACCEPTED PAPERS

(Workshop's preface)

STATISTICAL PARSING OF MORPHOLOGICALLY RICH LANGUAGES (SPMRL)

WHAT, HOW AND WHITHER

Reut Tsarfaty, Djamé Seddah, Yoav Goldberg, Sandra Kuebler, Yannick Versley, Marie Candito, Jennifer Foster, Ines Rehbein and Lamia Tounsi

(Long papers)

APPLICATION OF DIFFERENT TECHNIQUES TO DEPENDENCY PARSING OF BASQUE

Kepa Bengoetxea and Koldo Gojenola

DIRECT PARSING OF DISCONTINUOUS CONSTITUENTS IN GERMAN

Wolfgang Maier

FACTORS AFFECTING THE ACCURACY OF KOREAN PARSING

Tagyoung Chung, Matt Post and Daniel Gildea

HANDLING UNKNOWN WORDS IN STATISTICAL LATENT-VARIABLE PARSING MODELS

FOR ARABIC, ENGLISH AND FRENCH

Mohammed Attia, Jennifer Foster, Deirdre Hogan, Joseph Le Roux, Lamia Tounsi and Josef van Genabith

IMPROVING ARABIC DEPENDENCY PARSING WITH LEXICAL AND INFLECTIONAL MORPHOLOGICAL FEATURES

Yuval Marton, Nizar Habash and Owen Rambow

LEMMATIZATION AND LEXICALIZED STATISTICAL PARSING OF MORPHOLOGICALLY-RICH LANGUAGES: THE CASE OF FRENCH

Djamé Seddah, Grzegorz Chrupala, Ozlem Cetinoglu, Josef van Genabith and Marie Candito

MODELING MORPHOSYNTACTIC AGREEMENT IN CONSTITUENCY-BASED PARSING OF MODERN HEBREW

Reut Tsarfaty and Khalil Sima'an

ON THE ROLE OF MORPHOSYNTACTIC FEATURES IN HINDI DEPENDENCY PARSING

Bharat Ram Ambati, Samar Husain, Joakim Nivre and Rajeev Sangal

PARSING WORD CLUSTERS

Marie Candito and Djamé Seddah

TWO METHODS TO INCORPORATE 'LOCAL MORPHOSYNTACTIC' FEATURES IN HINDI DEPENDENCY PARSING

Bharat Ram Ambati, Samar Husain, Sambhav Jain, Dipti Misra Sharma and Rajeev Sangal

(Short paper)

EASY-FIRST DEPENDENCY PARSING OF MODERN HEBREW

Yoav Goldberg and Michael Elhadad

PROGRAM

Invited Talk: "Morphology in Statistical Machine Translation: Integrate-in or Tack-on?"

by Dr. Kevin Knight (University of Southern California)

Discussion panel

Dan Bikel (Google Research NY, USA)

Julia Hockenmaier (University of Illinois at Urbana-Champaign, USA)

Sandra Kübler (chair, Indiana University, USA)

Slav Petrov (Google Research NY, USA)

Owen Rambow (University of Columbia, USA)

REGISTRATION

Registration is now open. Please note that the early bird registration fees will not be available after May, 9.

Registration page

SUBMISSION

Authors are invited to submit long papers (up to 8 pages + 1 extra page for references) and short papers (up to 4 pages + 1 extra page for references).

Long papers should describe unpublished, substantial and completed research.

Short papers should be position papers, papers describing work in progress or short, focused contributions.

Papers will be accepted until March 1, 2010, March 12, 2010, 00h00 (PDT, GMT-8) in PDF format via the START system https://www.softconf.com/naaclhlt2010/mrl10/.

Submitted papers must follow the styles and the formating guidelines available on the main conference website http://naaclhlt2010.isi.edu/authors.html.

As the reviewing will be blind, the paper must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ...” must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ...” Papers that do not conform to these requirements will be rejected without review. In addition, please do not post your submissions on the web until after the review process is complete.

Submission web site : https://www.softconf.com/naaclhlt2010/mrl10/

PROGRAM COMMITTEE

Djamé Seddah, Jennifer Foster, Sandra Kübler, Reut Tsarfaty, Lamia Tounsi, Yannick Versley, Marie Candito, Ines Rehbein, Yoav Goldberg

REVIEW COMMITEE

Mohamed Attia (Dublin City University, Ireland)

Adriane Boyd (Ohio State University, USA)

Aoife Cahill (University of Stuttgart, Germany)

Marie Candito (University of Paris 7, France)

Grzegorz Chrupala (Saarland University, Germany)

Benoit Crabbé (University of Paris 7, France)

Michael Elhadad (Ben Gurion University, Israel)

Emar Mohamed (Indiana University, USA)

Jennifer Foster (Dublin City University, Ireland)

Josef van Genabith (Dublin City University, Ireland)

Yoav Goldberg (Ben Gurion University, Israel)

Julia Hockenmaier (University of Illinois, USA)

Deirdre Hogan (Dublin City University, Ireland)

Sandra Kübler (Indiana University, USA)

Alberto Lavelli (FBK-irst, Italy)

Joseph Le Roux (Dublin City University, Ireland)

Wolfgang Maier (University of Tübingen, Germany)

Takuya Matsuzaki (University of Toyko, Japan)

Detmar Meurers (University of Tübingen, Germany)

Yusuke Miyao (University of Toyko, Japan)

Joakim Nivre (Uppsala University, Sweden)

Ines Rehbein (Saarland University, Germany)

Kenji Sagae (University of Southern California, USA)

Benoit Sagot (Inria Rocquencourt, France)

Djamé Seddah (University of Paris Sorbonne, France)

Khalil Sima'an (University of Amsterdam, The Netherlands)

Nicolas Stroppa (Google Research, Switzerland)

Lamia Tounsi (Dublin City University, Ireland)

Reut Tsarfaty (University of Amsterdam, The Netherlands)

Yannick Versley (University of Tübingen, Germany)

ORGANIZERS AND CONTACTS

Sandra Kübler, Indiana University

Djamé Seddah, Université Paris-Sorbonne & Alpage project (Contact : djame.seddah@paris-sorbonne.fr)

Reut Tsarfaty, University of Amsterdam

To contact us : spmrl2010org@googlemail.com