Preservation storage is more than having backup copies. It requires appropriate technology, active management, policies, staffing, and predictable financial resources to ensure that digital content continues to be authentic and usable over the long term.
These are all features that can contribute to digital preservation. An institution’s resources, collections, and user needs will influence which are essential and which could be added or improved upon later.
Copies: There should be a minimum of three copies, with the LOCKSS strategy recommending seven copies. Copies should be independent, so errors in one copy are not automatically synced to the others. If there is an error in one copy, it can be replaced with one of the others.
Documentation: The effectiveness of digital preservation storage relies on its appropriate use. Policies and procedures must be written down and kept up to date, with a record of the changes over time (versioning), so we understand how digital content has been treated.
Exit Strategy: No technology lasts forever. There needs to be a method to migrate the digital content and metadata, including logs and any other relevant system information, out of a digital preservation storage system when it is time for it to be replaced.
File Monitoring: Files need to be regularly monitored for errors, so that they can be replaced with authentic copies before all copies are impacted. This typically involves storing checksums for the files and recalculating them on a regular basis to confirm the file is unchanged.
Geographic Distance: Copies should be in separate locations, so if damage is localized (e.g., the server room floods), other copies are not affected. It is ideal to have a copy in another geographic region, with a different natural disaster risk, which is typically done with cloud storage.
Maintenance: Storage technology (software, firmware, and hardware) should be actively monitored for errors and kept up to date to limit security risks and maintain performance. Hardware should be regularly replaced to limit the risk of data loss due to the higher failure rate of aging media.
Risk Management: A concentrated effort is made to understand the risks of the system, to mitigate them as much as possible, and to plan for how to recover if there is data loss. This includes having a disaster plan, regular review of the system, fire suppression and backup power for locations with hardware, and cross training staff so we always have someone who can maintain the system.
Security: To prevent accidental or malicious altering of files, the number of authorized users is limited, and they have the minimum permissions required to do their jobs. Few if any users have the ability to edit or delete files in the system. Both digital (e.g., firewall) and physical (e.g., locked doors) security measures are used to prevent unauthorized access to the storage.
Technology Diversity: Use at least two storage technologies, so that if a flaw is discovered in one or a vendor goes out of business, copies on the other technology are not impacted. It is beneficial for security to have a copy offline, which protects from hacking and ransomware attacks.
Transparency: To demonstrate that a digital object is authentic, we need a record of every action taken on that object, typically through system logs. This can be more difficult with vendor-controlled storage services. Transparency may also involve sharing documentation, checksums, and log information with users upon request.
The UGA Libraries uses two storage systems for digital preservation. ARCHive is for our highest priority digital content and the Digital Production Hub is for lower priority and in-process digital content. In our Levels of Preservation Policy (Levels of Preservation_2021-04.docx), ARCHive supports Full Preservation and Digital Product Hub supports Limited Preservation. Additional policies and workflows are needed to achieve these levels of preservation; it is not enough to just store the files in ARCHive or DP Hub.
ARCHive is for our most important digital content (Use Policy_2017-07-10.docx) and has the most features for digital preservation. It consists of on-site storage hardware and a web application for copying files into and out of storage and viewing metadata.
There are three copies across two kinds of storage media, one on a NAS (networked disk drives) and two on LTO tapes. The NAS has triple parity, meaning that it can automatically recover the data after a drive failure, without needing to get copies from tape. The NAS is in Special Collections and the tapes are in the Main Library, for a little geographic distance. We are adding a copy in Amazon Glacier to increase that distance.
Fixity is automatically verified for files when they are saved to the NAS or LTO tape and whenever they are copied out of the system. We do maintenance every six months to update firmware, make improvements, and check the fixity of the copy on the NAS and a very small sample of the tapes. Libraries IT get automatic notifications of errors and addresses them as they arise, such as replacing failing drives. Hardware is in secured rooms with fire suppression and backup power, and the system is not connected to the public internet. Every interaction with a file is logged and it is not possible to delete files once they are stored.
ARCHive is covered by our Digital Disaster Plan and documentation about its use is maintained in Teams. The Head of Digital Stewardship meets with Libraries IT monthly and the ARCHive application developer every two weeks to discuss its status and plan improvements.
We require minimum metadata (based on PREMIS preservation metadata standard) and that files be bagged to support the preservability of the files, and they are validated at the time of copying to storage to make sure the rules are followed. (AIP Definition_2020-10-19.docx)
ARCHive documentation (https://sites.google.com/view/uga-dcwg/home/archive-documentation)
DP Hub is mostly commonly used for ARCHive backlogs, file processing, and storing access derivatives (Digital Production Hub Use Policy_2022.docx). It consists of a server that is accessed via the local campus network.
There are two copies on two different servers. Both are currently in Special Collections, but one will be moved to the Main Library soon.
We do maintenance every month to update firmware and software. Libraries IT get automatic notifications of errors and addresses them as they arise, such as replacing failing drives. Hardware is in secured rooms with fire suppression and backup power, and the system is not connected to the public internet.
DP Hub is covered by our Digital Disaster Plan and documentation about its use is maintained in Teams (Digital Production Hub). The Head of Digital Stewardship meets with Libraries IT monthly to discuss its status and plan improvements.
Digital Production Hub Documentation (must be a UGA employee to access)
A risk driven approach to Bitstream Preservation (DPC Technology Watch, 2022)
http://doi.org/10.7207/twgn22-02
Guidance about the risks associated with preservation storage, a process for evaluating risk, how to mitigate specific risks, and evaluating the risk of cloud storage.
Back-Up and Storage (Digital Preservation Coalition)
A short guide with a definition of digital storage, a summary of the most common risks and how to address them, expertise needed to manage digital storage, and the best and worst types of storage.
Digital Preservation Storage Criteria (2018)
A list of 61 attributes of storage systems that contribute to digital preservation, intended to be used for evaluating and improving storage options. It also includes a usage guide which provides context about using the criteria for risk management, categories of risk for preservation storage, factors for making copies independent, factors for establishing content integrity, and how to evaluate the cost of storage. The next version of these criteria is under development.
Digital Preservation Handbook: Storage
https://www.dpconline.org/handbook/organisational-activities/storage
The storage article covers characteristics of resilient digital preservation storage, understanding measures of reliability (failure rate), and managing risk through having multiple copies, as well as a list of resources to learn more. The Handbook is an introduction to good practice for all areas of digital preservation created by the Digital Preservation Coalition.
NDSA Levels of Preservation (2019)
https://ndsa.org//publications/levels-of-digital-preservation/
Simple guidelines for evaluating a complete digital preservation program, which includes storage as one of the five functional areas.