Syntactic Reference Corpus of Medieval French (SRCMF)

Nouveau Corpus d'Amsterdam (NCA)

Software (for academic, non-commercial use only):

  • For Helmut Schmid's TreeTagger: Parameters for Old French:
  • For TIGERSearch (IMS, Universität Stuttgart): patched tiger.jar file for Mac OS X (compiled by Cyprian Gerstenberger; replace the original file in the lib directory with this version)
  • Dependency Parsing: Old French models for two mate tools parsers, trained on SRCMF 0.9 texts, as described in Stein 2016 (LREC paper). The paper and a readme file are contained in the zip archives:
    • for the mate tools joint transition parser (zip, 362MB): joint analysis of part of speech, morphological features, and dependencies
    • for the mate tools graph-based parser (zip 67MB): contrary to the approach described in the LREC paper, this package contains models for a full mate tools pipeline including lemmatisation, part of speech, morphological features, and dependencies. Results can be slightly improved if lemmatisation and tagging are done using TreeTagger and Marmot, see my other resources.
  • Dependency Parsing: Modern French models for the mate tools graph-based parser (zip 15MB). Trained on a small (+30k words) training corpus using the original SRCMF categories (LAS 79.93%).

Lexical resources

  • Tobler-Lommatzsch Altfranzösisches Wörterbuch.
    • Demo version and additions for the Old French lexicon DVD Tobler-Lommatzsch Altfranzösisches Wörterbuch, Steiner Verlag (Stuttgart). This version shows the first pages of the commercial digital version of the dictionary. Order the full DVD version from the publisher.
    • A plain text version built by optical character recognition can also be downloaded (by courtesy of the editor). OCR was performed on high resolution graphic files in 2001 (i.e. not on the low resolution files included in the DVD/CD version). It is uncorrected and contains numerous errors. Download the version updated in 2018 here (zip, 30 MB) or consult a work-in-progress version that was created to extract word senses (see below).
    • Old French word senses (given in German) were extracted from the OCR output, and partly matched with GermaNet and the corresponding synsets in WordNet. Files of version March 2020 (presented at the LREC 2020 conference) may be downloaded here.
  • Resources for French verbs, created in project B5 of DFG Sonderforschungsbereich 732.

Tutorials (in German or English)