Moman

Description

This is will eventually be a suite of tools to be used by an orthographic/grammatical checker and the checker itself. The tools are currently coded in Python, but I began to rewrite it in Lisp. Moman, the suite itself, will be consisted of the following tools:
  • FineNight is the FSA library.
  • A FST library. (Not yet implemented)
  • ZSpell is the orthographic checker.

FineNight

The FineNight library contains many algorithms for Finite State Automatons. That includes:
  • Union of two FSAs
  • Intersection of two FSAs
  • Complement of a FSAs
  • Difference of two FSAs
  • Reversal of a FSA
  • Closure of a FSA
  • Concatenation of two FSAs
  • Determination of a NFA
  • Equivalence test
  • Minimization algorithm
  • Construction of an IADFA from a sorted dictionary
  • Graphviz support
  • Error-Tolerant IADFA
Almost all algorithms were taken from the book Introduction to Automata Theory, Languages, and Computation [1]. The minimization algorithm is an implementation of Brzozowski's method [2]. In this method, the (possibly non-deterministic) automaton is reversed, determinized, reversed and determinized. I'll eventually add the Hopcroft's nlog(n) minimization algorithm [3].

ZSpell

ZSpell is meant to be a concurrent of aspell, made by Kevin Atkinson. At this time, ZSpell can suggest words with a Levenshtein-distance of one. Before we were using Kemal Oflazer's algorithm [4] . This algorithm is very slow, but now we use a faster algorithm (Schulz's and Mihov's algoritm [5]). However, only substitution, removal and insertion are used for the faster algorithm. It means that transpositions errors, like "ehllo" -> "hello", are considered as two operations.

Since version 0.2 we can specifiy a Levenshtein-distance greater than 1.

TODOs includes:
  • Add transposition errors for Levenshtein-distance algorithm.
  • Add phonetic errors (spelling by sound).
  • Add derivation errors.

Downloads

You can download Moman source here: (Mercurial repository):

References

  1. John E. Hopcroft , Rajeev Motwani and Jefferey D. Ullman, Introduction to Automata Theory, Languages and Computation , 2nd edition, Adison-Wesley, 2001. [bibtex]
  2. J. A. Brzozowski, Canonical regular expressions and minimal state graphs for definite events, in Mathematical Theory of Automata, Volume 12 of MRI Symposia Series, pp. 529-561, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y., 1962.[bibtex]
  3. John E. Hopcroft , An n log n algorithm for minimizing the states in a finite automaton , in The Theory of Machines and Computations, Z. Kohavi (ed.), pp. 189-196, Academic Press, 1971.[bibtex]
  4. Kemal Oflazer, Error-tolerant Finite State Recognition with Applications to Morphological Analysis and Spelling Correction , Computational Linguistics, 22(1), pp. 73--89, March, 1996. [bibtex]
  5. Klaus U. Schulz and Stoyan Mihov, Fast String Correction with Levenshtein-Automata, International Journal of Document Analysis and Recognition, 5(1):67--85, 2002. [bibtex]
  6. Zbigniew J. Czech , George Havas and Bohdan S. Majewski, An Optimal Algorithm for Generating Minimal Perfect Hash Functions , Information Processing Letters, 43(5):257--264, 1992. [bibtex]
Comments