Decapod is a project focused on building a low-cost digitization
solution that will allow for rare materials, materials held in
collections without large budgets, and other scholarly content to be
digitized into a high-quality PDF format. This project will work to
incorporate the hardware and software necessary to accomplish this
Scholarly content needs to be online, and for much mass produced content, that migration has happened. Unfortunately, the online presence of scholarly content is much more sporadic for long tail material such as small journals, original source materials in the humanities and social sciences, non-journal periodicals, and more. A large barrier to this content being available is the cost and complexity of setting up a digitization project for small and scattered collections coupled with a lack of revenue opportunities to recoup those costs. Collections with limited audiences and hence limited revenue opportunities are nonetheless often of considerable scholarly importance within their domains. The expense and difficulty of digitization presents a significant obstacle to making paper archives available online.
To meet this need we are building Decapod. Decapod will be an inexpensive attaché case sized hardware/software solution that can be readily procured and assembled and taken into the stacks or out into the field by local staff or volunteers to quickly and unobtrusively capture the material and deliver it in usable format. It will be open-source, easy to use, and will provide an out-of-the box method of digitizing small to medium archives of scholarly material. Decapod will remove the barriers to digitization now encountered by archives of documentary material: cost of equipment, cost of labour, lack of digitization expertise, lack of suitable distribution formats, and lack of acceptable remediation workflows. Decapod will address them all to produce a paper-to-digital document solution that is highly effective, highly automated, and low operator interaction (apart from page turning).
The solution will address these problem areas:
- Allow the camera based capture of bound material by using computer vision techniques to produce flat, clean page images equivalent to those produced from a flat bed scanner.
- Remove the need for extensive operator intervention in the capture process by detecting scan problems and allowing the operator to rectify the scan immediately.
- Reduce user intervention in the conversion process by using advanced document understanding techniques to remove almost all intervention, and by reducing the remainder to very simple "1-click" operations.
- PDF/A outputs will be visually faithful to the original, searchable, and widely usable.
- Allow the output to be viewable on mobile devices that support PDF reflow.
- Remove the need for deep software, hardware or digitization skills by integrating all software components into a turnkey end-to-end solution.
- Remove capital cost barriers by using consumer grade cameras.
The Andrew W. Mellon Foundation
140 East 62nd Street
New York, NY 10065
Tel: (212) 838-8400
Fax: (212) 888-4172