All together the data consists of 22,000 messages from various social media and web sources. The data can be found here:

All data can be used for development. For convenience we already provided a train, dev split.

For a detailed description of the annotation process and annotations please also consult our annotation guidelines.

Test Data

the second part of the test has just been released. If you want to take part in the official competition, please don't forget to register here. The two data sets represent two different points in time:

Timestamp 1:

Timestamp 2:

Please, also consult the guidelines when creating the submission that contains your predictions.


We provide the data in two formats:

  1. TSV: ID<tab> Text <tab> Relevance <tab> Sentiment <tab> Aspect:Polarity (whitespace separated)
  2. XML:
<Document id=ID>
        <Opinion category="Category" from="A" to="B" polarity="POLARITY" target="TARGET"/>

The aspects in the data are chosen from predefined inventories of categories. The table below gives an overview on them. Each category has several sub-aspects (e.g. Atmosphere#Temperature, Atmosphere#Cleanliness, ...). While we provide these sub-aspects in the data, we will evaluate only on the categories.

The corresponding aspects can be found in our annotation guidelines.

Germeval 2017 categories