Test-Corpus and Method

This page provides some details about the test-corpora and metrics used to evaluate our automatic enjambment detection system. To see the results tables directly, visit [this page].

Test corpora

We evaluated the system with two test-corpora, that we manually annotated for enjambment:
  • SonnetEvol corpus: 100 sonnets from different periods, from the 15th, 16th, 17th and 19th centuries. These sonnets come from our large diachronic sonnet corpus. The distribution of sonnets in this test corpus is roughly 70% from the 15th to the 17th century, and 30% from the 19th century (i.e. the manually annotated test corpus does not prioritize the 19th century, unlike the large corpus). 
  • Cantos20th corpus: ca. 1000 lines of 20th century poetry by Antonio Colinas (1983). 
In each of the corpora, we found approx. 275 enjambed line pairs. 

We used a 20th century corpus that shows natural syntax in order to be able to compare with the results on sonnets from earlier periods: the sonnets can show archaic language and often show an elevated register, which are expected to be more difficult for an NLP pipeline to analyze correctly. 

As regards interannotator agreement, we have obtained two annotators' input for half of the reference set. Overlap across both annotators' results was high.

Evaluation methods

We performed a typed and untyped enjambement detection evaluation:
  • untyped match (or span-match): an enjambment proposed by the system is considered correct if the manually annotated reference also lists an enjambment for the same pair of lines.
  • typed match (or typed span-match): for an enjambment proposed by the system to be correct, the line-pair must be present in the manually annotated reference, and also the enjambment type proposed by the system must match the one in the reference for that line-pair. 
With respect to the test-corpora, precision, recall and F1 were calculated. See the results tables