Results
Official ranking of competing task submissions to the tokenization subtask:
SoMaJo (F1 = 99.57%)
AIPHES (F1 = 99.36%)
COW (F1 = 98.98%)
LTL-UDE (F1 = 98.90%)
For comparison, the tokenizer of the Stanford tagger achieves F1 = 98.38% on these data.
Official ranking of competing task submissions to the PoS tagging subtask:
UdS-distributional (acc = 90.44%)
LTL-UDE (acc = 89.09%)
AIPHES (acc = 88.75%)
bot.zen (acc = 88.03%) [late submission]
For comparison, TreeTagger achieves an accuracy of 82.48% on these data.
Detailed results can be found in the task description paper
Michael Beißwenger, Sabine Bartsch, Stefan Evert and Kay-Michael Würzner (2016). EmpiriST 2015: A shared task on the automatic linguistic annotation of computer-mediated communication and web corpora. In Proceedings of the 10th Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, pages 44–56. Berlin, Germany.
in the presentation slides from the final workshop, as well as in this online spreadsheet:
Detailed performance figures for each individual text sample are provided as TAB-delimited text tables, so they can easily be analyzed with statistical software such as R or loaded into a spreadsheet editor.
Download evaluation results for individual files: empirist_results.zip