Resources

For your participation in GermEval 2015: LexSub you may wish to make use of one or more German-language resources. The task organizers can assist you in obtaining three popular resources:

    • GermaNet (Hamp and Feldweg, 2007; Henrich and Hinrichs, 2010) is a lexical-semantic network that relates German-language nouns, verbs, and adjectives. It is similar to the English WordNet. You can obtain GermaNet as a standalone resource or as part of a UBY database.
    • UBY (Gurevych et al., 2012) is a large-scale lexical-semantic resource which links information from several expert- and collaboratively constructed resources for English and German. The linked resources include GermaNet, WordNet, and the English and German versions of Wikipedia and Wiktionary.
    • JobimText (Biemann et al., 2013) is an automatically induced resource for German by means of distributional semantics. Distributional Thesauri as well as distributional features of words are provided as a RESTful API and as a database. These features were demonstrated to be beneficial for lexical substitution in Szarvas et al., 2013.

Obtaining standalone GermaNet

The standalone GermaNet data is free for academic users. If you are a member of an academic institution, you can apply for a GermaNet licence and obtain the data directly from the Department of Linguistics at the University of Tübingen. This licence covers any non-commercial research using GermaNet; it is not limited to your participation in GermEval 2015: LexSub.

If you are not a member of an academic institution, then you can obtain a special licence for GermaNet which is limited to your participation in GermEval 2015: LexSub. In this case you should write directly to both both Prof. Erhard Hinrichs at erhard.hinrichs@uni-tuebingen.de and Marie Hinrichs at marie.hinrichs@uni-tuebingen.de to request this licence. (Note that this must be done by someone with signing authority, such as the head of your department or company.) Your application will be acknowledged and processed as quickly as possible, and you will be provided a licence agreement to sign and e-mail back. You will then receive instructions by e-mail on how to download GermaNet.

Obtaining UBY (including GermaNet)

The task organizers can provide a UBY database containing GermaNet (among many other resources) to participants who hold a valid GermaNet licence. (If you do not already hold a GermaNet licence, follow the instructions above to obtain one first.) Simply e-mail your request for a UBY database to Tristan Miller at miller@ukp.informatik.tu-darmstadt.de (making sure to indicate your institutional affiliation). We will verify with the GermaNet developers that you hold a GermaNet licence and send you instructions on how to download the UBY database.

We provide sample code demonstrating how to use the UBY database.

Obtaining JobimText models

For the German language two distributional models have been computed using JoBimText. Both models are computed based on 70 million German newspaper sentences from the Leipzig Corpora Collection. The two use different context features for computing similarities:

- Trigram: left and right neighbors to compute similarities

- MateParser

- MateParser with Lemma

A Java Eclipse project with example code and a README with instructions how to load the models into a MySql database is available here.

For an short insight into the models the JoBimText demo can be used , by selecting the models Trigram (German) and Parsed (German) or by a Perl script using the RESTful API (see read2DGerman.pl).