Prof. Lonneke van der Plas, Assoc. Professor at USI

Resources

(released with Coling 2014 paper: What good are 'Nominalkomposita' for 'noun compounds':

Multilingual Extraction and Structure Analysis of Nominal Compositions using Linguistic Restrictors )

This database contains automatically extracted English noun compounds and their translations in up to ten languages, extracted from the OPUS Europarl resource.

The extracted languages:

Danish
Dutch
English
French
German
Greek
Italian
Portuguese
Romanian
Spanish
Swedish

PropBank semantic role annotations on French and English sections of Europarl.

(Package S2.1, data released by the FP7 CLASSiC Project)

README for this package

There are three packages that provide syntactic and semantic annotations for the Europarl corpus (Koehn, 2005).

The Europarl parallel corpus is extracted from the proceedings of the European Parliament. The format used is

the CoNLL09 format described in http://ufal.mff.cuni.cz/conll2009-st/task-description.html (see 'Data format').

1) Package S2.1-1: "The EuroParl parallel corpus: Hand-annotated French data" (1MB txt file) contains 1000

French sentences manually annotated using the annotation scheme of PropBank. The syntactic annotations are

the output of a parser (Titov and Henderson, 2007) trained on the dependency conversion of the French Treebank

into dependency format (Candito et al. ,2009).

2) Package S2.1-2: "The EuroParl parallel corpus: Parsed English Data" (114MB gz file) contains 983K English

sentences from the Europarl corpus and their syntactic-semantic analysis as provided by the parser

(Henderson et al., 2008, Titov et al., 2009) that has been trained on the merge of The Penn Treebank corpus

with PropBank labels and NomBank labels.

3) Package S2.1-3: "The EuroParl parallel corpus: Parsed French Data" (110MB gz file) contains 983K French

sentences from the Europarl corpus and their syntactic-semantic analysis as they result from our work on

automatic cross-lingual semantic role annotation (Van der Plas et al. 2011).

---- References:

M.-H. Candito, B. Crabbé́ , P. Denis, and F. Guérin. 2009. Analyse syntaxique du francais : des constituants ̧

aux dépendances. In Proceedings of TALN, Senlis, France.

J. Henderson, P. Merlo, G. Musillo, and I. Titov. 2008. A latent variable model of synchronous parsing for syn-

tactic and semantic dependencies. In Proceedings of CONLL 2008, Manchester, UK.

P. Koehn. 2005. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of

the MT Summit 2005, Phuket, Thailand.

L. van der Plas, P. Merlo and J. Henderson. 2011. Scaling up Cross-Lingual Semantic Annotation Transfer

In Proceedings of ACL/HLT, Portland, US.

I. Titov and J. Henderson. 2007. A latent variable model for generative dependency parsing. In Proceedings of

the International Conference on Parsing Technologies (IWPT-07), Prague, Czech Republic.

I. Titov, J. Henderson, P. Merlo, and G. Musillo. 2009. Online graph planarisation for synchronous parsing of

semantic and syntactic dependencies. In Proceedings of the twenty-first international joint conference on ar-

tificial intelligence (IJCAI-09), Pasadena, California.

---------------- END OF README FILE ----------------------------------

Page updated

Google Sites

Report abuse