OCR on Mac OS X made easy

Tesseract OCR is now available ready to run on Mac OS X Tiger on Intel Macs. No idea if this still works on PPC Macs or other versions of OS X. If you have such a system, mail cff2doc at yahoo dot com so we can update this page.

Just drop your bitmap or grayscale Tiff image on this icon and a new text file will be created in the same folder. Use the free Bean text editor to edit the text which is now encoded in UTF8.

Download the program Tiff2Text (zipped) and a sample Tiff to try it out.

This program is free and in the public domain. It uses the open source OCR engine Tesseract from HP and the Tiff library from libTiff.

You could even change the AppleScript that drives this command line program, if you are familiar with those things and have different requirements.

Tool : Tesseract uses two dictionaries in dawg format, that were probably not yet very well understood and reverse engineered up till June 2007. The following source, linked with the Tesseract library, can show all words in freq-dawg and word-dawg. Download it here if you know how to build software for your OS and if you want to explore further how Tesseract really works.

In the hope that this will aid further development of this open source software for other languages.