Tools

I've developed some small-scale tools for corpus analysis and management. All are available under an open source licence via Sourceforge:

Syllabic Verse Analysis (https://sourceforge.net/projects/syllabic-verse-analysis/):
- Script designed to assist in the generation of metrical annotation for Romance syllabic verse, essential for the creation of the Old Gallo-Romance Corpus. The first stage splits orthographic forms into syllables while the second stage scans the result assigning each syllable to a metrical position in the line of verse. Exports to PAULA-XML suitable for use with ANNIS. See Rainsford (2022).
- Status: fully functional, development still ongoing.
Tokenized Text Aligner (https://sourceforge.net/projects/tokenized-text-aligner/):
- Automatically aligns two similar versions of the same text token by token. Useful when combining annotation from corpora with different tokenization policies and/or comparing different editions of the same manuscript and/or different manuscripts of the same text. The quality of the result obviously depends on the similarity of the source texts but I've found it to be surprisingly robust.
- Status: complete 2020
KNIC Concordances (https://sourceforge.net/projects/knicconcordances/):
- Backend for TIGERSearch/TIGER-XML to generate concordance-style tabular results from treebank queries. See also Rainsford and Heiden (2014).
- Status: complete 2014

IMPORTANT CAVEAT: These tools are provided as-is without warranty or guarantees any kind. In particular, they are developed on Linux and I have no plans to test them on other operating systems.