This page discusses the format of annotations in the NLP toolkit and the result annotation formats produced by the enjambment detection system. A description for the enjambment detection system itself is [here].
To show an example the formats, we'll use sonnet "Al Lector", by Juan de Castellanos (1522-1607).
The NLP pipeline (IXA Pipes) uses NAF, the NLP annotation format, (Fokkens et al., 2014).
This format provides NLP annotations in different layers, e.g.
A term-id allows to link information from one layer to the other
NAF output for our example poem is available [here].
We're outputting enjambment detection results as in-line annotations and as standoff annotations, using a delimited format (tab-separarted).
In both cases, each line is identified by a unique poem ID and by a line number. This woud allow to produce other formats like TEI using the Verse module in the future.
The columns (and their values where an explanation is needed) mean the following:
These annotations are in delimited format (tab-separated), and display the following information:
Unlike in-line results, standoff results only list a pair of lines if enjambment occurs between them (the absence of enjambment is not marked).
The standoff results for our example poem are below: