Research Interests

With the mass digitization that is happening at Google, the Internet Archive, and the Hathi Trust to name a few, the need for clean optical character recognition (OCR) grows. There isn’t much point to having an image of a page available over the web if you can’t search the page and millions more for what you want.

My research involves using natural language processing methods to improve the results from existing OCR output. The ultimate goal is to discover the true underlying text from which the OCR was generated.

Academic Honors and Awards