The evaluation will be done using micro average F1-measure for the relevance, document sentiment and aspect-based sentiment sub-task.
For the OTE subtask we will compute micro-averaged F1 based on exact and partial overlap.
An executable jar, usage instructions and sources can be found on https://github.com/muchafel/GermEval2017.
The test-data will be published both in xml and tsv. We will publish the data in the following manner:
<Document id= XXX"> <relevance>unknown</relevance> <sentiment>unknown</sentiment> <text>Wie schön es ist wenn man sich nach nem nervigen Arbeitstag auch noch über die Bahn ärgern muss.</text> </Document>To further illustrate the test data, please find a transformed version of the development set here