Morfette is a tool for supervised learning of inflectional
morphology. Given a corpus of sentences annotated with lemmas
and morphological labels, and optionally a lexicon, morfette
learns how to morphologically analyse new sentences.
In the learning stage Morfette fits two separate logistic regression
models: one for morphological tagging and one for lemmatization. The
predictions of the models are combined dynamically and produce a
globally plausible sequence of morphological-tag - lemma pairs for
a sentence.
In Morfette lemmatization is cast as a classification task where a
a lemmatization class corresponds to the specification of the edit
operations which are needed to transform the inflected word form into
the corresponding lemma.
The basic approach is described in (Chrupala et al 2008 and Chrupala 2008).
The current version of Morfette uses an averaged perceptron to
fit the models, rather than Maximum Entropy training. The lemmatization
classes are Edit-Tree-based as described in (Chrupala 2008).
References
- Grzegorz Chrupala, Georgiana Dinu and Josef van Genabith. 2008. Learning Morphology with Morfette. In Proceedings of LREC 2008. http://www.lrec-conf.org/proceedings/lrec2008/pdf/594_paper.pdf
- Grzegorz Chrupala. 2008. Towards a Machine-Learning Architecture for Lexical Functional Grammar Parsing. Chapter 6. PhD dissertation, Dublin City University. http://grzegorz.chrupala.me/papers/phd-single.pdf
- Seddah, D., Chrupała, G., Çetinoğlu, Ö., Van Genabith, J., & Candito, M. (2010). Lemmatization and lexicalized statistical parsing of morphologically rich languages: the case of French. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages (pp. 85-93). http://www.aclweb.org/anthology/W/W10/W10-14.pdf#page=95
- Grzegorz Chrupała. 2011. Efficient induction of probabilistic word classes with LDA. In Proceedings of IJCNLP. http://www.aclweb.org/anthology/I/I11/I11-1041.pdf
Contributors- Grzegorz Chrupała (now at Tilburg University) is the main author.
- Djamé Seddah
(Université Paris 4 La Sorbonne, Alpage Project) is using and crash-testing Morfette,
as well as currently working on infrastructure aspects of the project.
- Georgiana Dinu (now at University of Trento) and Josef van Genabith (Dublin City University) contributed ideas and coauthored a paper on Morfette (see above)
|