Amazon Glacier is a "secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup" (see Amazon Glacier home page). Hosted on multiple redundant Amazon cloud servers, it provides a cost-effective backup for digital files that can keep our master archival files safe in the case of local systems failures or disasters. Points to note about Glacier include:
It is "cold" storage. We upload master files there for long-term backup, not for normal retrieval of access copies.
Uploading and storing files on Glacier is relatively cheap. Retrieving files however is expensive and should be done with care. In case we do need to retrieve large amounts of data from our Glacier backup, the retrieval should be done in consultation with the Systems Librarian in order to ensure that we are pulling the files down slowly, to minimize transfer costs. As Glacier notes: "Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time. You can retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. If you choose to retrieve more than this amount of data in a month, you are charged a retrieval fee starting at $0.01 per gigabyte."
Files are uploaded to and downloaded from Glacier in sets, not as individual files. For example, we would upload a set of files labelled "PUA_OH_1-100" rather than each individual file. See more detailed directions below for examples on how this works.
FastGlacier, a software interface, mediates our file management in Glacier. The Systems Librarian retains documentation about obtaining access to our accounts via this interface.
Files should be uploaded in "sets," so the first task is to create and name a set of files:
Individual files should first be named according to our File Naming Conventions.
Files should be bundled together with other files of the same type (e.g. PDFs, MOVs, etc.). The sets should contain consecutively numbered files, e.g. PUA_MS51_1.pdf, PUA_MS51_2.pdf, PUA_MS51_3.pdf, etc.
The set should be named to reflect the individual files within the set. Examples:
A set named PUA_MS51_1-500 within Glacier's PDF folder would be expected to contain PDFs numbered PUA_MS51_1 through PUA_MS_500.
A set named PUA_RG1-RG2 within Glacier's Images folder would be expected to contain image files whose names begin with the stems PUA_RG1 through PUA_RG2
Using FastGlacier, navigate to the Pacific University Archives vault.
Within that vault, there is a folder for each media type: audio, video, pdf, and image files:
Audio files:
On FastGlacier, navigate to the Pacific University Archives audio folder and double-click on it.
Make sure the compression rule for the Pacific University Archives Vault audio file is enabled (it should automatically be enabled, but to confirm, go to Tools-Compression and Encryption. Double click on the rule and make sure “Enabled” box is checked at the bottom).
Click “Upload” and navigate to the PUA_OH_1-53 on the external hard drive to compress and upload that folder onto Glacier.
Video files:
Videos should NOT be compressed.There is no compression rule set for video files so non-compression is the default. The rest of the process is the same as for audio.
Text and image files:
Process is the same as above. Compression can be used if files are not already in a compressed format (such as zip or other archives, jpeg, gif, or png images). Set compression rule manually for each upload when there are no compressed files to be uploaded.
What type of data/files will be stored?
TIFF, JPEG, MOV and WAV files. These are digital preservation master copies of unique original material such as oral history recordings and historic photographs. The storage is intended to be a backup of last resort and is a critical part of insuring the long-term viability of digital files, particularly of vulnerable audio-visual material where the file sizes can be very large, and where the original media (such as audiocassettes and VHS tapes) have a short life span. The digital file sets are large enough that we cannot ensure their security using technology like external hard drives; technology like DVDs are not considered sufficiently secure for digital preservation purposes. The estimated space we will be using is around 6 TB.
Who will have access?
The University Archivist, the Project Archivist (when applicable), the Systems Librarian, and optionally a contact in the university’s IT department.
Is Pacific's shared file server (Box) not sufficient to handle this storage?
Box is primarily a secure file-sharing platform, rather than a digital preservation system. Though it is “secure”, the kind of security that it provides has to do with legal compliance and making sure that files are kept secure within an organization. It is not oriented towards the issue of keeping digital files secure from a long-term preservation standpoint. For our needs, we need a platform that can help us be sure that the digital files we make today will be verifiably unaltered and un-degraded even after decades of storage. Box’s drawbacks include that it can allow accidental overwriting/deletion of files; read/write access is open to a much larger number of people than desirable; it has no file fixity verification process that we can use; files cannot be efficiently uploaded because it requires either mirroring of a hard drive (which won’t work because of the file sizes involved) or usage of a browser-based upload (which would be extremely slow with the file sizes we have); that it makes no promises regarding long-term bit preservation; and it is more expensive to store 6 TB there compared to the very low per-TB cost of Amazon Glacier. Best practices for digital preservation, including those mandated by federal grants for library digitization projects, require a platform like Glacier, which is designed as a stable and inexpensive storage platform for big sets of archival data.
Why is it worth spending money on Glacier?
Glacier is like an insurance policy against the loss of large investments Pacific has put into creating digital master copies of archival holdings. We have devoted thousands of hours of staff/student worker time and hundreds of thousands of dollars in grant funding for related projects. If we were to lose the master digital files from our LSTA-funded oral history digitization project, for example, that would wipe out much of the production of one full-time staff member over a year, plus additional time from other workers and money spent on specialized outsourced services. Grant agencies including LSTA, NEH and NHPRC (the three main sources of library/archives grant funding) have been implementing more stringent requirements for grant proposals to document their compliance with digital preservation standards. If we wish to continue to be successful in winning grants for the Pacific Archives, we need to keep up with these standards, which platforms like Box do not meet.
National Digital Stewardship Alliance (NDSA): “The NDSA Levels of Digital Preservation: An Explanation and Uses.” The matrix in this document is the national standard for judging archival digital preservation plans. Without Glacier, we will not be up to Level 1 (minimum) requirements.
Yan Han , (2015),"Cloud storage for digital preservation: optimal uses of Amazon S3 and Glacier", Library Hi Tech, Vol. 33 Iss 2 pp. 261 – 271. This article provides a brief overview of some of the requirements and trends in digital preservation, and then goes into more detail about Glacier’s costs and feasibility as a digital preservation storage platform.
Pacific University Archives Digital Preservation Plan – outlines the Pacific Archives’ conformation to NDSA guidelines and location implementation details.