Tools

I've developed some small-scale tools for corpus analysis and management. All are available under an open source licence via Sourceforge:

  • Syllabic Verse Analysis (https://sourceforge.net/projects/syllabic-verse-analysis/):

    • Script designed to assist in the generation of metrical annotation for Romance syllabic verse, essential for the creation of the Old Gallo-Romance Corpus. The first stage splits orthographic forms into syllables while the second stage scans the result assigning each syllable to a metrical position in the line of verse. Exports to PAULA-XML suitable for use with ANNIS. See Rainsford (2022).

    • Status: fully functional, development still ongoing.

  • Tokenized Text Aligner (https://sourceforge.net/projects/tokenized-text-aligner/):

    • Automatically aligns two similar versions of the same text token by token. Useful when combining annotation from corpora with different tokenization policies and/or comparing different editions of the same manuscript and/or different manuscripts of the same text. The quality of the result obviously depends on the similarity of the source texts but I've found it to be surprisingly robust.

    • Status: complete 2020

  • KNIC Concordances (https://sourceforge.net/projects/knicconcordances/):

    • Backend for TIGERSearch/TIGER-XML to generate concordance-style tabular results from treebank queries. See also Rainsford and Heiden (2014).

    • Status: complete 2014

IMPORTANT CAVEAT: These tools are provided as-is without warranty or guarantees any kind. In particular, they are developed on Linux and I have no plans to test them on other operating systems.