ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages (SP-Sem-MRL 2012)


Date: July 12, 2012
Location: Jeju, Republic of Korea

Invited speakers: Pr. Mark Steedman and Ivan Titov. Abstracts of the talks now available. 
Morphologically Rich Languages (MRLs) are languages in which grammatical relations such as Subject, Predicate, Object, etc., are indicated morphologically (e.g. through inflection) instead of positionally (as in, e.g. English), and the position of words and phrases in the sentence may vary substantially. The tight connection between the morphology of words and the grammatical relations between them, and the looser connection between the position and grouping of words to their syntactic roles, pose serious challenges for syntactic and semantic processing. Furthermore, since grammatical relations provide the interface to compositional semantics, morpho-syntactic phenomena may significantly complicate processing the syntax--semantics interface. In statistical parsing, which has been a cornerstone of research in NLP and had seen great advances due to the widespread availability of syntactically annotated corpora, English parsing performance has reached a high plateau in certain genres, which is however not always indicative of parsing performance in MRLs, dependency-based and constituency-based alike . Semantic processing of natural language has similarly seen much progress in recent years. However, as in parsing,  the bulk of the work has concentrated on English, and MRLs may present processing challenges that the community is as of yet unaware of, and which current semantic processing technologies may have difficulty coping with. These challenges may lurk in areas where parses may be used as input, such as semantic role labeling, distributional semantics, paraphrasing and textual entailments, or where inadequate pre-processing of morphological variation hurts parsing and semantic tasks alike.

This joint workshop aims to build upon the first and second SPMRL workshops (at NAACL-HLT 2010 and IWPT 2011, respectively) while extending the overall scope to include semantic processing where MRLs pose challenges for algorithms or models initially designed to process English. In particular, we seek to explore the use of newly available syntactically and/or semantically annotated corpora, or data sets for semantic evaluation that can contribute to our understanding of the difficulty that such phenomena pose. One goal of this workshop is to encourage cross-fertilization among researchers working on different languages and among those working on different levels of processing. Of particular interest is work addressing the lexical sparseness and out-of-vocabulary (OOV) issues that occur in both syntactic and semantic processing.
Submission deadlines
Long papers: April 18, 2012  (PDT, GMT-8) 
Short papers: - syntactic parsing: April 22, 2012 (PDT, GMT-8) 
                      - semantic processing and syntax-semantics interface: April 28, 2012 (PDT, GMT-8)
Notification to authors: May 12, 2012
Camera ready copy: May 20, 2012 

Financial support application: June 15, 2012
Financial support notification: June 17, 2012
Workshop: July 12, 2012 

We are pleased to announce financial support in workshop registration fees to up to five students and PASCAL members, through the generous sponsorship of PASCAL Network of Excellence. Priority will be given to PASCAL members and to students with accepted papers, but all students (and all PASCAL members) are encouraged to apply.

Pr. Mark Steedman (University of Edinburgh)
Title: The Computational Grammar of Case

There is a long tradition associating language and other serial behavior with a more primitive sensorymotor planning system. The evidence is linguistic, developmental, neurophysiological, and evolutionary. I'll argue that the grammatical operations of composition and type-raising used in wide-coverage CCG parsers have their origin in the elementary operations of affordance-based planning, and that the latter operation is the basis for both morphological and structural case-systems. I'll show that case is closely related to event semantics and preserves related head-dependency information in the parsing model. I'll give evidence from morphologically rich languages like Navajo and Japanese, and structurally-cased languages like English and Dutch, and suggest such grammars support greedier and more incremental parsing algorithms than those currently used for CCG, of a kind that has recently shown some success in dependency parsing.

Ivan Titov (Saarland University)
Title: Monolingual and Crosslingual Unsupervised Induction of Shallow Semantic Representations

Inducing meaning representations from text is one of the key objectives of NLP. Most existing statistical techniques for tackling this problem rely on large human-annotated datasets, which are expensive to create and exist only for a very limited number of languages. Even then, they are not very robust, cover only a small proportion of semantic constructions appearing in the labeled data, and are domain-dependent. In this work, we investigate Bayesian models which do not use any labeled data but induce semantic representations from unannotated texts. Unlike semantically-annotated data, unannotated texts are plentiful and available for many languages and many domains which makes our approach particularly promising.  We consider induction in two different set-ups.  First, we consider induction from monolingual texts with the resulting model achieving the best reported results among unsupervised approaches on a standard benchmark task.  Secondly, we show that multilingual parallel data provides a valuable additional source of indirect supervision for induction of semantics.  In addition, we will discuss some challenges in designing probabilistic models for unsupervised learning of semantics flexible enough to accommodate for diversity of linguistic structures across languages.  Joint work with Alexandre Klementiev.


João Silva and António Branco

Abdelati Hawwari, Kfir Bar and Mona Diab

Iakes Goenaga, Koldobika Gojenola, María Jesús Aranzabe, Arantza Díaz de Ilarraza and Kepa Bengoetxea

Joseph Le Roux, Benoit Favre, Alexis Nasr and Seyed Abolghasem Mirroshandel

DongHyun Choi, Jungyeul Park and Key-Sun Choi

Szymon Acedański, Adam Slaski and Adam Przepiórkowski

Enrique Henestroza Anguiano and Marie Candito

Joseph Le Roux, Benoit Sagot and Djamé Seddah

Yannick Versley

Alejandra Lorenzo and Christophe Cerisara

Nathan Green, Loganathan Ramasamy and Zdeněk Žabokrtský

Yannick Versley and Verena Henrich

Session 0:  Opening Session (chairs: Yuval Marton and Reut Tsarfaty)

09:00-09:05 Statistical Parsing and Semantic Processing of MRLs: Overview of the workshop

09:05-10:05 Invited Talk (I) by Ivan Titov (chair: Ido Dagan)
                        Title: Monolingual and Crosslingual Unsupervised Induction of Shallow Semantic Representations 

Session 1:  Statistical Parsing of MRLs (I) (chair: Ido Dagan)
10:05–10:30 Probabilistic Lexical Generalization for French Dependency Parsing
                        Enrique Henestroza Anguiano and Marie Candito

10:30-11:00 Coffee Break

Session 2: Semantic Processing of MRLs (chair: Yuval Marton)
11:00–11:25 Supervised Learning of German Qualia Relations
                        Yannick Versley
11:25–11:40 Building an Arabic Multiword Expressions Repository
                        Abdelati Hawwari, Kfir Bar and Mona Diab
11:40–11:55 Unsupervised frame based Semantic Role Induction: application to French and English
                        Alejandra Lorenzo and Christophe Cerisara
11:55–12:10 Using Synthetic Compounds for Word Sense Discrimination
                        Yannick Versley and Verena Henrich
12:10–12:25 Machine Learning of Syntactic Attachment from Morphosyntactic and Semantic Co-occurrence Statistics
                        Szymon Acedanski, Adam Slaski and Adam Przepiórkowski

12:30-14:00 Lunch Break
14:00-15:00 Invited Talk (II) by Mark Steedman (chair: Reut Tsarfaty)
                        TitleParsing with Case, Coordination, and Free Word-Order 

Session 3: Statistical Parsing of MRLs (II) (chair: Yannick Versley)
15:00–15:15 Combining Rule-Based and Statistical Syntactic Analyzers
                        Iakes Goenaga, Koldobika Gojenola, María Jesús Aranzabe, Arantza Díaz de Ilarraza and Kepa Bengoetxea
15:15–15:30 Statistical Parsing of Spanish and Data Driven Lemmatization
                        Joseph Le Roux, Benoit Sagot and Djamé Seddah

15:30-16:00 Coffee Break

Session 4:  Statistical Parsing of MRLs (III) (chair: Marie Candito)
16:00–16:25 Assigning Deep Lexical Types Using Structured Classifier Features for Grammatical Dependencies
                        João Silva and António Branco
16:25–16:40 Using an SVM Ensemble System for Improved Tamil Dependency Parsing
                        Nathan Green, Loganathan Ramasamy and Zdenek Žabokrtský
16:40–17:05 Korean Treebank Transformation for Parser Training
                        DongHyun Choi, Jungyeul Park and Key-Sun Choi
17:05–17:30 Generative Constituent Parsing and Discriminative Dependency Reranking: Experiments on English and French
                        Joseph Le Roux, Benoit Favre, Alexis Nasr and Seyed Abolghasem Mirroshandel

17:30-17:40 Short Break

Session 5:  Closing Session
17:40-18:10 Panel: Disclosing the SPMRL 2013 Shared Task (chair: Reut Tsarfaty)
18:10-18:20 Concluding Remarks (chairs: Reut Tsarfaty and Yuval Marton)

18:20-20:20    "meet and greet" light dinner and drinks for all workshop participants (funded by PASCAL)

The workshop will be organised around three broad themes:

Syntactic Models:  Models and architectures that explicitly integrate morphological analysis and parsing; Cross-language and cross-model comparison of strengths and weaknesses regarding particular linguistic phenomena.
Semantic Models:  State-of-the-art semantic analysis and generation methods for MRLs, including semantic similarity and entailment criteria and their task-specific instantiation, and suitable representations for semantic tasks in MRLs.
Joint Modeling Aspects: Improving lexical coverage and handling of out-of-vocabulary (OOV) words by utilising lexical knowledge or unsupervised/semi-supervised learning techniques; The role of parsing in semantic analysis for MRLs; Pre-processing issues that jointly affect parsing and semantic analysis; Syntax-Semantics interfaces for monolingual or multilingual systems.

The areas of interest for this joint workshop include, but are not limited to, the following topics: 

 Syntactic Parsing of MRLs
  • parsing models and architectures that explicitly integrate morphological analysis and parsing 
  • parsing models and architectures that focus on lexical coverage and the handling of OOV words either by incorporating linguistic knowledge or through the use of unsupervised/semi-supervised learning techniques 
  • cross-language and cross-model comparison of models' strengths and weaknesses in the face of particular linguistic phenomena (e.g. morpho-syntactic characteristics, degree of word-order freedom ...) 
  • comprehensive analyses of the strengths and weaknesses of various parsing models on particular linguistic (e.g. morpho-syntactic) phenomena with respect to variation in tagsets, annotation schemes and additional data transformations 
Semantic Processing of MRLs
  • semantic distance and entailment criteria in the MRL space (e.g., with respect to inflection, derivation, root, pattern, lemma, tense, and/or aspect, etc.); possibly task-specific criteria
  • lexical resources and morphological analysis tools facilitating semantic distance measures and semantic relation detection
  • methods and models for semantic similarity/distance calculation, clustering and paraphrasing relying on MRL properties, and using: probability, vector/graph representation, data-driven and/or linguistic rules, pivoting/SMT, machine-learning, etc.
  • paraphrase and textual entailment detection or generation, specific to MRLs (e.g., task-specific issues of inclusion or exclusion of certain paraphrase and textual entailment patterns differing in inflection)
  • use of morphological analysis for semantic calculation aimed at reducing sparsity / OOV rate, preferably without losing information due to mere lemmatization
  • semantic role labeling (SRL) for MRLs; verbal/nominalized selectional preferences
The Syntax-Semantics Interface:
  • parsing-based semantic processing tasks, e.g., semantic role labeling (SRL)
  • processing of compounds and multi-morphemic words: optimal level(s) of tokenization, representation, and morphological analysis for either/both tasks
  • syntax-aware semantic distance measures, paraphrasing and textual entailment
  • semantic classes and/or relations as input to syntactic parsing

In addition to the standard (oral or poster) presentations in the sessions, the SP-Sem-MRL workshop will feature a panel of commentators for a selection of the talks, allowing for an extended discussion period. This new feature is introduced in order to foster in-depth discussions and to nurture interactions among researchers. It is our hope that these interactions will help to bring ideas (and solutions) to the fore and promote a more rapid advance of the state-of-the-art in the field.

There will be no shared task on MRLs this year. However,  we will take this opportunity to disclose, during a special session of SP-Sem-MRL, the data sets and evaluation procedures for the cross-linguistic cross-framework shared task which was discussed at previous SPMRL panels, and which is planned for SPMRL 2013 at IWPT 2013. Researchers who are interested in participating in the shared task or teams that wish to add their data sets to the task are encouraged to attend the session and contribute to the discussion.

Authors are invited to submit long papers (up to 10 pages + any number of reference pages) and short papers (up to 5 pages + any number of reference page). Long papers should describe unpublished, substantial and completed research. Short papers should be position papers, papers describing work in progress or short, focused contributions.
General chairs

Marianna Apidianaki (LIMSI-CNRS, France)
Ido Dagan (Bar-Ilan University, Israel)
Jennifer Foster (Dublin City University, Ireland) 
Yuval Marton (IBM Watson Research Center, US)
Djamé Seddah (University of Paris Sorbonne, France)
Reut Tsarfaty (Uppsala University, Sweden)

Shared session chairs

Katrin Erk (University of Texas at Austin, US)
Ines Rehbein (University of Potsdam, Germany)
Peter Turney (National Research Council, Canada)
Yannick Versley (University of Tuebingen, Germany)

Ion Androutsopoulos (Athens Univ. of Economics and Business, Greece)
Mohammed Attia (Dublin City University, Ireland)
Adriane Boyd (Ohio State University, US)
Bernd Bohnet  (University of Stuttgart, Germany)
Marie Candito (University of Paris 7, France)
Aoife Cahill (Educational Testing Service, US)
Gülşen Cebiroğlu Eryiğit (Istambul Technical University, Turkey)
Ozlem Cetinoglu (University of Stuttgart, Germany)
Jinho Choi (University of Colorado at Boulder, US)
Grzegorz Chrupala  (Saarland University, Germany) 
Benoit Crabbé (University of Paris 7, France)
Josef van Genabith (Dublin City University, Ireland)
Yoav Goldberg (Google Research NY, US)
Spence Green (Stanford University, US)
Veronique Hoste (University College Ghent, Belgium)
Samar Husain (Potsdam University, Germany)  
Sandra Kübler (Indiana University, US) 
Jonas Kuhn (University of Stuttgart, Germany)
Mirella Lapata (University of Edinburgh, UK)
Alberto Lavelli (FBK-irst, Italy)
Alessandro Lenci (University of Pisa, Italy)
Joseph Le Roux (Université Paris-Nord, France)
Wolfgang Maier (University of Düsseldorf, Germany)
Nitin Madnani (Educational Testing Service, NJ)
Takuya Matsuzaki (University of Tokyo, Japan)
Aurélien Max (LIMSI-CNRS, France)
Yusuke Miyao (University of Tokyo, Japan)
Preslav Nakov (Qatar Computing Research Institute, Qatar)
Roberto Navigli (Sapienza University of Rome, Italy)
Kemal Oflazer (Carnegie Mellon University, Qatar)
Sebastian Pado (University of Heidelberg, Germany)
Patrick Pantel (Microsoft Research, US)
Sameer Pradhan (BBN Technologies, US)
Benoit Sagot (INRIA Rocquencourt, France)
Kenji Sagae (University of Southern California, US)
Idan Szpektor (Bar-Ilan University, Israel)
Lamia Tounsi (Dublin City University, Ireland)
Tim Van de Cruys (University of Cambridge, UK)
Stephen Wan (CSIRO ICT Centre, Sydney)
Deniz Yuret (Koc University Istanbul, Turkey)
Zdenek Zabokrtsky (Charles University, Czech Republic)
Shiqi Zhao (Baidu Inc., China)

This workshop is sponsored by SIGLEXSIGPARSE, the INRIA's Alpage project, and the EU's PASCAL Network of Excellence.



Why a "Joint Workshop" on MRLs processing?
Two proposals on Statistical Parsing of MRLs and on Semantic Processing of MRLs were independently proposed this year at ACL 2012. Given the proximity of the two fields and the great potential they offer for collaborations, the two proposals were merged. We decided to join forces in synergy and build the best possible program for this joint event.