This is will eventually be a suite of tools to be used by an
orthographic/grammatical checker and the checker itself.
The tools are currently coded in Python, but I began to rewrite it in Lisp.
Moman, the suite itself, will be consisted of the following tools:
- FineNight is the FSA library.
- A FST library. (Not yet implemented)
- ZSpell is the orthographic checker.
The FineNight library contains many algorithms for Finite State Automatons.
That includes:
- Union of two FSAs
- Intersection of two FSAs
- Complement of a FSAs
- Difference of two FSAs
- Reversal of a FSA
- Closure of a FSA
- Concatenation of two FSAs
- Determination of a NFA
- Equivalence test
- Minimization algorithm
- Construction of an IADFA from a sorted dictionary
- Graphviz support
- Error-Tolerant IADFA
Almost all algorithms were taken from the book
Introduction to Automata Theory, Languages, and Computation
[ 1]. The minimization
algorithm is an implementation of Brzozowski's method
[ 2].
In this method, the (possibly non-deterministic) automaton is reversed,
determinized, reversed and determinized. I'll eventually add the
Hopcroft's nlog(n) minimization algorithm
[ 3].
ZSpell is meant to be a concurrent of
aspell, made by Kevin Atkinson.
At this time, ZSpell can suggest words with a Levenshtein-distance of one.
Before we were using Kemal Oflazer's algorithm
[4] . This algorithm is very
slow, but now we use a faster algorithm (Schulz's and Mihov's algoritm
[5]). However, only substitution, removal and
insertion are used for the faster algorithm. It means that transpositions
errors, like "ehllo" -> "hello", are considered as two operations.
Since version 0.2 we can specifiy a Levenshtein-distance greater than 1.
TODOs includes:
- Add transposition errors for Levenshtein-distance algorithm.
- Add phonetic errors (spelling by sound).
- Add derivation errors.
You can download Moman source here: (Mercurial repository):
-
John E. Hopcroft
,
Rajeev Motwani and
Jefferey D. Ullman,
Introduction to Automata Theory, Languages and Computation
, 2nd edition, Adison-Wesley, 2001.
[bibtex]
-
J. A. Brzozowski,
Canonical regular expressions and minimal state graphs for definite events,
in Mathematical Theory of Automata, Volume 12 of MRI Symposia Series,
pp. 529-561, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y.,
1962.[bibtex]
-
John E. Hopcroft
,
An n log n algorithm for minimizing the states in a finite automaton
,
in The Theory of Machines and Computations, Z. Kohavi (ed.), pp. 189-196,
Academic Press, 1971.[bibtex]
-
Kemal Oflazer,
Error-tolerant Finite State Recognition with Applications to
Morphological Analysis and Spelling Correction
,
Computational Linguistics, 22(1), pp. 73--89, March, 1996.
[bibtex]
-
Klaus U. Schulz
and
Stoyan Mihov,
Fast String Correction with Levenshtein-Automata,
International Journal of Document Analysis and Recognition, 5(1):67--85, 2002.
[bibtex]
-
Zbigniew J. Czech
,
George Havas
and
Bohdan S. Majewski,
An Optimal Algorithm for Generating Minimal Perfect Hash Functions
, Information Processing Letters, 43(5):257--264, 1992.
[bibtex]
|
|