Old OCRopus Wiki

Upcoming Releases

OCRopus 0.4 (alpha4)

We're busy finishing up version 0.4 of OCRopus.  It's taken so long because there was quite a bit of refactoring we needed to do, and a bit of software infrastructure we needed to create.  This is probably going to be the last alpha release before the beta release.

Here are a number of changes:
  • We're moving the main development to Mercurial; this should make it easier for people to contribute and to maintain ports.  It also means that we will have broken versions in the public repositories less often than we did with Subversion.
  • We're refactoring the distribution in order to make installations and maintenance easier.   In particular, the OpenFST support and ocroscript will become separate sub-projects.
There is quite a bit of new functionality:
  • A new segmenting line recognizer that replaces bpnet. 
  • A new set of classifiers that can be configured for a variety of classification tasks and should scale from fairly small training sets to very large training sets. 
  • The neural network classifier uses multi-core training and automatic parameter selection.
  • A new set of decoders, implementing both A* and beamsearch algorithms.
  • A set of statistical language models.
  • New debugging and visualization tools that make diagnosing problems easier.
  • A new set of command line programs and a directory tree representation of a complete, scanned book.
  • The ability to perform training and adaptation on books.
  • More operations on narrays.
  • Lots of other smaller changes, bug fixes, additions.
  • Trainable page rotation detection.
  • A new lightweight component model that makes it easier to combine different OCRopus components into new recognizers.
We'll announce the location of the new repositories when we release OCRopus 0.4.