Annotation and Result Format

This page discusses the format of annotations in the NLP toolkit and the result annotation formats produced by the enjambment detection system. A description for the enjambment detection system itself is [here].

To show an example the formats, we'll use sonnet "Al Lector", by Juan de Castellanos (1522-1607).

Annotation Format in the NLP Toolkit

The NLP pipeline (IXA Pipes) uses NAF, the NLP annotation format, (Fokkens et al., 2014).

This format provides NLP annotations in different layers, e.g.

terms layer: for part of speech and lemmatization
constituency layer: for syntactic parsing (constituents like verb group or noun group)
deps layer: for syntactic dependency parsing (e.g. functions like subject or direct object).

A term-id allows to link information from one layer to the other

NAF output for our example poem is available [here].

Result Format

We're outputting enjambment detection results as in-line annotations and as standoff annotations, using a delimited format (tab-separarted).

In both cases, each line is identified by a unique poem ID and by a line number. This woud allow to produce other formats like TEI using the Verse module in the future.

In-line annotations

The columns (and their values where an explanation is needed) mean the following:

Poem ID
Ln: Line Number
Tokens: The format is {token Part-of-Speech term-id}. The term ID allows to find syntactic information from the other modules of the NLP output, see the example. The tagset is based on the Ancora corpus, with some modifications.
Pt: for Position. This indicates whether the line takes part in an enjambment or not. If it does, it indicates the line's position within the enjambed line-pair:
- O: for Out. The line is not part of enjambment
- B: for Beginning. An enjambed line-pair starts here.
- I: for Inside. It's the second line in an enjambed line-pair
- IB: The line is part of two enjambments.
  - It is the second line in an enjambed line-pair, involving this line and the preceding one.
  - It is also the beginning of a new enjambed line-pair involving this line and the one following it.
Enjambment types: A description of the types is available elsewhere on this site, [here].

Standoff annotations

These annotations are in delimited format (tab-separated), and display the following information:

Poem Id
Ln1: Line number for first enjambed line
Ln2: Line number for second enjambed line
Etype: Enjambment type for the line pair in the two previous fields (the types are described [here])

Unlike in-line results, standoff results only list a pair of lines if enjambment occurs between them (the absence of enjambment is not marked).

The standoff results for our example poem are below:

Page updated

Google Sites

Report abuse