Latest Work and Release

Decapod is a project focused on building a low-cost digitization solution that will allow for rare materials, materials held in collections without large budgets, and other scholarly content to be digitized into a high-quality PDF format. This project will work to incorporate the hardware and software necessary to accomplish this goal.

Project Goals

Scholarly content needs to be online, and for much mass produced content, that migration has happened. Unfortunately, the online presence of scholarly content is much more sporadic for long tail material such as small journals, original source materials in the humanities and social sciences, non-journal periodicals, and more. A large barrier to this content being available is the cost and complexity of setting up a digitization project for small and scattered collections coupled with a lack of revenue opportunities to recoup those costs. Collections with limited audiences and hence limited revenue opportunities are nonetheless often of considerable scholarly importance within their domains. The expense and difficulty of digitization presents a significant obstacle to making paper archives available online.

To meet this need we are building Decapod. Decapod will be an inexpensive attaché case sized hardware/software solution that can be readily procured and assembled and taken into the stacks or out into the field by local staff or volunteers to quickly and unobtrusively capture the material and deliver it in usable format. It will be open-source, easy to use, and will provide an out-of-the box method of digitizing small to medium archives of scholarly material. Decapod will remove the barriers to digitization now encountered by archives of documentary material: cost of equipment, cost of labour, lack of digitization expertise, lack of suitable distribution formats, and lack of acceptable remediation workflows. Decapod will address them all to produce a paper-to-digital document solution that is highly effective, highly automated, and low operator interaction (apart from page turning).

The solution will address these problem areas:
  1. Allow the camera based capture of bound material by using computer vision techniques to produce flat, clean page images equivalent to those produced from a flat bed scanner.
  2. Remove the need for extensive operator intervention in the capture process by detecting scan problems and allowing the operator to rectify the scan immediately.
  3. Reduce user intervention in the conversion process by using advanced document understanding techniques to remove almost all intervention, and by reducing the remainder to very simple "1-click" operations.
  4. PDF/A outputs will be visually faithful to the original, searchable, and widely usable.
  5. Allow the output to be viewable on mobile devices that support PDF reflow.
  6. Remove the need for deep software, hardware or digitization skills by integrating all software components into a turnkey end-to-end solution.
  7. Remove capital cost barriers by using consumer grade cameras.

Project Partners

IUPR Research Group
University of Kaiserslautern
Prof. Dr. Thomas Breuel
Trippstadter Str 122
67663 Kaiserslautern, Germany
Adaptive Technology Resource Centre
Jutta Treviranus
University of Toronto
First Floor 130
St. George St.
Toronto, Ontario, Canada
JSTOR Ann Arbor
John Burns
301 East Liberty, Suite 300
Ann Arbor, MI 48104-2262, USA

Project Funding

The Andrew W. Mellon Foundation
140 East 62nd Street
New York, NY 10065
Tel: (212) 838-8400
Fax: (212) 888-4172