Step 4. Scanning

The process for scanning documents is:

Create a master image, saved as a TIFF (Tagged Image File Format) File

Generate derivative or access images from this master image

Save the master images onto long term, redundant storage and mount the access images on a web server for public access

For manuscripts and printed materials use the following guidelines:

300 dpi minimum

24-bit RGB color

Sharpen lightly if needed using unsharp mask filter (be careful not to use too much!)

Save as TIFF

* Note that the descreening functionality provided by your scanner's image capture software may need to be used for materials containing halftone screens made up of visible dots (as found in newspapers and magazines) or detailed engravings or other art.


300 dpi minimum

B&W: 8-bit grayscale

Color: 24-bit RGB color

Cabinet Cards/Cartes d'visite: 24-bit RGB color

Sharpen lightly if needed using unsharp mask filter (be careful not to use too much!)

Save as TIFF

You may need to use a background made of white or black paper or cardboard to ensure that colors are correctly reproduced and that ink does not show through from the other side of multi-page documents.

Color. When installing your scanner and computer, be sure that the monitor and scanner are calibrated according to the manufacturer's instructions, so that colors are consistent from one image to another and match a standardized color bar. Most scanners come with a color target and instructions on calibrating the color settings. After calibrating, be sure not to change the settings of your monitor. It is a good idea to check the calibration of the scanner and monitor before each scanning session.

Scanners usually automatically read and adjust for color during the scanning process. Keep in mind that the final image should look like an exact duplicate of the original object on the screen. If scans come out too yellow, too dark, or too light, adjust the scan settings and try again.

Filenaming. If the photographs or manuscripts have an identifier or number associated with them, use this as the filename. For the sake of consistency, use only lowercase letters in filenames. Avoid using underscores and periods as well. Always use three character file extensions (.tif, .jpg, .gif). Generally the shorter the filename the better.

For example, if a photograph collection uses the identification number 55-JBC-2 for an individual photograph, the filename for that image can be 55-jbc-2.tif.

If you need to make up a file naming system, a good method is to use a lowercase letter or two followed by at least 3 numbers (if you expect more than 999 images, use 4 numbers). So for the fictional "Georgia Railroad Photograph Collection" we might use:

gr001.tif gr002.tif etc.

For multi-page documents like letters with fewer than 26 pages, adding a letter at the end is a good option:

br035a.tif br035b.tif br035c.tif

For books or long manuscripts, follow the file naming scheme shown above for photographs.

Always keep the same number of characters in the filename and do not vary the length. This makes automatic processing and manipulation of the images by programs much easier. Overall, be consistent! It will make things much more manageable.

Cropping. Cropping depends on the material being scanned. For most images, any background is cropped out leaving just the object. Photographs can be cropped to the edges of the photograph, and printed books can be cropped to the edge of the page. It is desirable for items such as diaries to crop outside the edge of the object, so that your image looks like a picture of a book rather than disembodied pages.

Image Derivatives:

Several common types of derivative images may be created from master images.

GIF (Graphics Interchange Format) This format is only currently used for creating thumbnails and 1 bit bitonal (black & white) images.

JPEG (Joint Photographic Experts Group) This format is widely used for creating medium and high resolution images for Web delivery.

PDF (Portable Document Format) From Adobe, this compressed format requires users to have the Adobe Acrobat Reader software installed on their machine (a common default on newer machines and browsers). It offers the benefit of re-sizing on screen and easy printing of documents. This format is most commonly used for printed documents.

MrSID Also from LizardTech, the MrSID format is most commonly used for large oversized items like maps and posters. Server software may be used with this image format to deliver JPEG images to the user's browser and allow them to zoom into and resize images while maintaining high levels of quality. Some repositories are beginning to adopt the open JPEG2000 format, instead.

The most common practice for creating derivatives is to lower the resolution of the master TIFF image to 150dpi and 72 dpi, letting the height and width of the image adjust to this resolution automatically in Photoshop (the image size on the screen should be reduced and not the same as the master image). This creates a roughly a "1x" and "2x" magnification of the original.

For Digital Library of Georgia projects, it is standard practice to offer a 72dpi jpeg and sometimes a PDF image for manuscript materials. The MrSID format is used for large format and photographic materials.

To create thumbnails, the width of the master image is reduced to 100-200dpi before saving as GIF or JPEG format (JPEG recommended). The most common reduction is 150 pixels wide.

The JPEG format allows for various levels of compression. For most purposes, save JPEG derivatives at "High Quality" in Photoshop and examine them closely for artifacts of the compression process. These usually show up as "squiggles" around letters or sharp edges in the image. If these appear, decrease the level of compression (by increasing the quality value in Photoshop) when saving.



Digital Reformatting and File Management (Public Library Partnerships Program, DPLA)

Previous module (Equipment for Digitization) | Next module (Text Encoding)