Annotation and Result Format

This page discusses the format of annotations in the NLP toolkit and the result annotation formats produced by the enjambment detection system. A description for the enjambment detection system itself is [here]

To show an example the formats, we'll use sonnet "Al Lector", by Juan de Castellanos (1522-1607). 

Annotation Format in the NLP Toolkit

This format provides NLP annotations in different layers, e.g.
  • terms layer: for part of speech and lemmatization
  • constituency layer: for syntactic parsing (constituents like verb group or noun group)
  • deps layer: for syntactic dependency parsing (e.g. functions like subject or direct object). 

A term-id allows to link information from one layer to the other

NAF output for our example poem is available [here]

Result Format

We're outputting enjambment detection results as in-line annotations and as standoff annotations, using a delimited format (tab-separarted).

In both cases, each line is identified by a unique poem ID and by a line number. This woud allow to produce other formats like TEI using the Verse module in the future. 

In-line annotations

The columns (and their values where an explanation is needed) mean the following:
  • Poem ID
  • Ln: Line Number
  • Tokens: The format is {token Part-of-Speech term-id}. The term ID allows to find syntactic information from the other modules of the NLP output, see the example. The tagset is based on the Ancora corpus, with some modifications
  • Pt: for Position. This indicates whether the line takes part in an enjambment or not. If it does, it indicates the line's position within the enjambed line-pair:
    • O: for Out. The line is not part of enjambment
    • B: for Beginning. An enjambed line-pair starts here. 
    • I: for Inside. It's the second line in an enjambed line-pair
    • IB: The line is part of two enjambments.
      • It is the second line in an enjambed line-pair, involving this line and the preceding one.
      • It is also the beginning of a new enjambed line-pair involving this line and the one following it.
  • Enjambment types: A description of the types is available elsewhere on this site, [here]

 Ln Tokens in Line with Parts-of-speech and Term-IDs Pt Enjambment
 01 {Lector N t1} {amigo N t2} {, O t3} {claramente A t4} {veo V t5} B ex_dobj_verb
 02 {salir V t6} {a P t7} {luz N t8} {aqueste D t9} {monumento N t10} IB ex_dobj_verb;
 03 {sin P t11} {aquellos D t12} {matices N t13} {y C t14} {ornamento N t15} IB pb_noun_prep; 
 04 {que Q t16} {por P t17} {ventura N t18} {tienes V t19} {en P t20} {deseo N t21} {. O t22} I cc_crossclause
 05 {Con P t23} {sólo A t24} {la D t25} {verdad N t26} {lo Q t27} {hermoseo N t28} {, O t29} O 
 06 {porque C t30} {no A t31} {pide V t32} {tanto A t33} {crecimiento N t34} B pb_noun_prep
 07 {de P t35} {variedades N t36} {, O t37} {mas A t38} {detenimiento N t39} IB pb_noun_prep;
 08 {del P t40} {que Q t41} {suele V t42} {llevar V t43} {veloz G t44} {correo N t45} {. O t46} I pb_noun_prep
 09 {La D t47} {peregrinación N t48} {es V t49} {inexhausta G t50} {, O t51} O
 10 {la D t52} {vida N t53} {breve G t54} {, O t55} {vena N t56} {mal A t57} {propicia G t58} B pb_adj_prep
 11 {para P t59} {me Q t60} {detener V t61} {en P t62} {las D t63} {jornadas N t64} {. O t65} I pb_adj_prep
 12 {Y C t66} {ansí A t67} {vamos V t68} {de P t69} {paso N t70} {, O t71} {porque C t72} {basta V t73} O 
 13 {en P t74} {aqueeste C t75} {compendio N t76} {dar V t77} {noticia N t78}B pb_noun_prep
 14 {de P t79} {las D t80} {cosas N t81} {que Q t82} {estaban V t83} {olvidadas G t84} {. O t85} I pb_noun_prep

Standoff annotations

These annotations are in delimited format (tab-separated), and display the following information:
  • Poem Id
  • Ln1: Line number for first enjambed line
  • Ln2: Line number for second enjambed line
  • Etype: Enjambment type for the line pair in the two previous fields (the types are described [here])
Unlike in-line results, standoff results only list a pair of lines if enjambment occurs between them (the absence of enjambment is not marked).

The standoff results for our example poem are below:

 Poem Id Ln1 Ln2Etype 
 20_36 01 02 ex_dobj_verb
 20_36 02 03 ex_dobj_verb
 20_36 03 04 cc_crossclause
 20_36 06 07 pb_noun_prep
 20_36 07 08 pb_noun_prep
 20_36 10 11 pb_adj_prep
 20_36 13 14 pb_noun_prep