Recent site activity

Introduction

Morfette is a tool for supervised learning of inflectional morphology. Given a corpus of sentences annotated with lemmas and morphological labels, and optionally a lexicon, morfette learns how to morphologically analyse new sentences.

In the learning stage Morfette fits two separate logistic regression models: one for morphological tagging and one for lemmatization. The predictions of the models are combined dynamically and produce a globally plausible sequence of morphological-tag - lemma pairs for a sentence.

In Morfette lemmatization is cast as a classification task where a a lemmatization class corresponds to the specification of the edit operations which are needed to transform the inflected word form into the corresponding lemma.

The basic approach is described in (Chrupala et al 2008 and Chrupala 2008). The current version of Morfette uses an averaged perceptron to fit the models, rather than Maximum Entropy training. The lemmatization classes are Edit-Tree-based as described in (Chrupala 2008).

Morfette has been recenlty used for the part-of-speech tagging and the lemmatization of the Est-Républicain French corpus (160M words). 


References


Contributors

  • Grzegorz Chrupała  (now at Tilburg University) is the main author.
  • Djamé Seddah (Université Paris 4 La Sorbonne, Alpage Project) is using and crash-testing Morfette, as well as currently working on infrastructure aspects of the project.
  • Georgiana Dinu (now at University of Trento) and Josef van Genabith (Dublin City University) contributed ideas and coauthored a paper on Morfette (see above)