Data
Data
All together the data consists of 22,000 messages from various social media and web sources. The data can be found here:
All data can be used for development. For convenience we already provided a train, dev split.
For a detailed description of the annotation process and annotations please also consult our annotation guidelines.
Test Data
the second part of the test has just been released. If you want to take part in the official competition, please don't forget to register here. The two data sets represent two different points in time:
Timestamp 1:
Timestamp 2:
Please, also consult the guidelines when creating the submission that contains your predictions.
Formats
We provide the data in two formats:
- TSV:
ID<tab> Text <tab> Relevance <tab> Sentiment <tab> Aspect:Polarity (whitespace separated)
- XML:
<Document id=ID>
<text>TEXT</text>
<Opinions>
<Opinion category="Category" from="A" to="B" polarity="POLARITY" target="TARGET"/>
</Opinions>
</Document>
The aspects in the data are chosen from predefined inventories of categories. The table below gives an overview on them. Each category has several sub-aspects (e.g. Atmosphere#Temperature, Atmosphere#Cleanliness, ...). While we provide these sub-aspects in the data, we will evaluate only on the categories.
The corresponding aspects can be found in our annotation guidelines.