Beyond publications

Tools

1. Ancient Greek Hexameter Analysis

I have developed a Python library for the automatic annotation of Ancient Greek hexameter verse. The code is open-source and can be downloaded from my GitHub.

2. German Sentiment Classification

This is joint work with Oliver Guhr and colleagues. Information about the Python module and the paper is available here.

3. Multilingual Punctuation

This is also joint work with Oliver and colleagues. More information can be found here.

Data sets

1. ACL RD-TEC: Manual Annotation of Terms and Semantic Classes

Behrang QasemiZadeh and I have annotated computational linguistics terms and their semantic classes in a set of abstracts sampled from the ACL Anthology Reference Corpus (ACL ARC).

2. Complingterm: Automatic Annotation of Semantic Classes

Héctor Martínez Alonso and I have created a data set that extends ACL RD-TEC. Complingterm contains terms from the ACL RD-TEC (versions 1.0 and 2.0) and attributes all of them to 1 of 4 coarse-grained semantic classes. For training, we used semantic classes from ACL RD-TEC 2.0, but merged them into larger containers. The resulting term-class list contains automatically annotated items that had not been annotated in the ACL RD-TEC.

3. SemEval Data

For the SemEval-2018 Task 7, a data set containing (specialised) entity and semantic relation annotations was created. The texts were taken from the ACL ARC, the domain is computational linguistics. The data can be downloaded here.

4. Other

I have also collected a (short) list of Russian non-breaking prefixes for the Perl Lingua::Sentence module.

Community

Workshops and Teaching

Reviews

I have reviewed for the following conferences and journals: