With the mass digitization that is happening at Google, the Internet Archive, and the Hathi Trust to name a few, the need for clean optical character recognition (OCR) grows. There isn’t much point to having an image of a page available over the web if you can’t search the page and millions more for what you want.
My research involves using natural language processing methods to improve the results from existing OCR output. The ultimate goal is to discover the true underlying text from which the OCR was generated.
Academic Honors and Awards
- 2009. Best Student Paper at the Joint Conference of Digital Libraries. Nominee for Best Conference Paper.
- 2006. Beta Phi Mu, national library honor society
- 2003. LITA/Christian Larew Memorial Scholarship in Library and Information Technology
- 1981. Hughes Aircraft Fellow for study at Stanford
- 1978. Pi Mu Epsilon, national mathematics honor society