This glossary includes practical definitions and examples of how these terms are used at UGA.
AIP: Acronym that stands for “Archival Information Package,” as established by the OAIS reference model. Refers to the set of digital archival materials and the metadata required to keep it useable and understandable. It is the data that is ingested into the digital preservation system (ARCHive).
Archival storage: The digital preservation system where AIPs are stored and monitored for format obsolescence and changes in fixity. At UGA Libraries, ARCHive is our archival storage.
Authenticity: In relation to digital records, the quality of being original and unchanged by any unintentional means. This can be verified using checksum information.
Backlog: Archival materials that the library has not yet been processed and made accessible to
patrons.
Bag/BagIt: BagIt is a file packaging format designed for moving or storing digital files. It allows for easy validation of checksum data to ensure that files remain unchanged. It is structured as a bundle of folders (called a “bag”) that includes the files themselves, an inventory of the files, and a list of checksum values (called a “manifest”). From the Library of Congress definition: A bag includes clear delineations between the digital content itself (stored in a subdirectory called “data”) and the metadata quantifying it. It also allows for optional basic descriptive elements that are stored within the bag (in a file called bag-info.txt) to provide recipients or custodians of the content with enough information to identify the provenance, contact information, and context for the file delivery or storage package.
Bit preservation: A preservation strategy that involves managing the original version of a digital record by monitoring its authenticity and maintaining a secure storage environment with appropriate back-ups (University of Edinburgh definition). ARCHive does this by routinely checking the fixity of the files in the system.
Bit rot: The corruption of digital data over time caused by gradual physical degradation on storage media. It cannot be undone, and instead must be remediated by replacement from a backup.
Born-digital: Originating in a computer environment (SAA definition), e.g. word processing files, digital photographs, or digital audio/video. Distinct from “digitized,” where the content has been scanned or photographed from an analog artifact.
Content preservation: A preservation strategy that focuses on the need to be able to enable digital content to be rendered and understood for the long term (DPC definition). When we choose to reformat files, we are taking a content preservation strategy, prioritizing the accessibility of the contents of the digital file rather than the exact sequence of bits. For example: reformatting an old Microsoft Works Database file as a CSV because the original software is obsolete and unsupported.
Checksum: A unique alphanumeric value that represents the bitstream of an individual computer file or set of files (SAA definition). This value is created by an algorithm, such as MD5 or SHA-256. Checksums are generated at several points in the preservation lifecycle and compared to determine a file's authenticity. This includes whenever the files are moved or copied, and at regular intervals while stored in the digital preservation system. If something changes at even the bit level in a file, the checksum changes completely.
Digital curation/stewardship: Often used interchangeably at UGA Libraries. Refers to the range of actions involved in documenting, managing, and preparing digital files for ingest, and preserving and providing access to them into the future. Digital stewardship begins when materials arrive or are created at the Libraries and extends through preservation and researchers’ ongoing access to the files. The Libraries has a Digital Curation Working Group that is charged with promoting the long-term integrity of and access to the locally held digital collections of the University of Georgia Libraries.
Digital preservation: The management and protection of digital information to ensure authenticity, integrity, reliability, and long-term accessibility (SAA definition).
Digitize: To transform analog information into digital form. Digitization may transform information stored in analog physical formats (such as paper and parchment) or in analog but electronic formats (such as magnetic audiotape or phonograph discs). Records can be digitized via scanning or photography or via the conversion of analog audiovisual information into bits (SAA definition). UGA Libraries has multiple departments engaged in digitization projects that digitize archival materials for direct researcher use and/or access via the Digital Library of Georgia.
DIP: Acronym that stands for “Dissemination Information Package,” as established by the OAIS reference model. The set of digital archival materials that is delivered to the library user. This may differ from the AIP if patrons receive altered “access copies” of the materials, e.g. a version in a different format or containing redactions.
Disaster plan: An actively maintained document containing procedures and information needed to prevent, mitigate, prepare for, respond to, and recover from emergencies (SAA definition). The UGA Libraries has a Digital Disaster Plan for digital materials that is updated annually and maintained by the Digital Curation Working Group.
Fixity: Regarding a digital file, the property of being unchanged at the bit level. Fixity is often determined by checksum data and is checked and validated several times over the digital preservation lifecycle.
Format migration: The process of converting a digital file to a different file type to prevent it from becoming inaccessible due to software obsolescence (SAA definition). For example, a Microsoft Works spreadsheet file is at imminent risk of becoming unopenable because the software is no longer supported. Converting these files to a format that is more universally accessible, such as a CSV, helps to preserve the content by making it openable and usable for longer. A similar process is “normalization,” which refers to more overarching policies in which digital files are by default converted to a particular format or formats that are considered best for preservation.
Ingest: (Noun) The process of adding AIPs into a digital preservation system like ARCHive. Can also refer to the server where AIPs are added when they are queued to be ingested. (Verb) To accept electronic content or metadata into a repository or database (SAA definitions).
Machine-readable: Understandable/readable only with the intervention of specialized equipment or a computer. Commonly used to refer to electronic records, which may be stored on magnetic media or punch cards. However, phonograph records, audiocassettes, and CDs and DVDs are examples of analog machine-readable formats (SAA definition). In a digital curation context, “machine-readable” can also refer to the data being used by a program or piece of code, and whether a computer is able to understand and act on that data. UGA Libraries use several scripts (short computer programs) that are written in-house to process digital files for preservation. In some cases, the data used by these scripts must first be structured in a way that is machine-readable (such as CSV or JSON format) to be interpreted by a computer.
OAIS reference model: Acronym for the Open Archival Information System, standard (ISO 14721) and framework of relationships that describes the components and processes necessary for a digital archives, including six distinct functional areas: ingest, archival storage, data management, administration, preservation planning, and access (SAA definition). The design of the UGA Libraries’ digital preservation system, ARCHive, was informed by OAIS.
Obsolescence: The state of being outdated to the point of being almost or completely unusable. In a digital stewardship context, this is used in reference to file formats and storage media that cannot be accessed using modern hardware/software. The associated risk is data loss.
PII: Acronym for “personally identifiable information.” This is personal data found in collections that should either be removed or redacted before the materials are made public because it represents a privacy issue. Common examples include social security numbers, private login information, and credit card numbers.
PREMIS: Acronym for “PREservation Metadata: Implementation Strategies," a metadata standard to support the preservation of digital objects and ensure their long-term usability (Library of Congress definition). The metadata that is included in the AIPs ingested into ARCHive conforms to PREMIS standards.
Redaction: The process of concealing sensitive information in a document before the file is made publicly accessible.
Reformat: Changing the format of a digital file for preservation or access purposes. For example, changing a Microsoft Word document to a PDF.
Removable media: Material used to store data that can be taken out of a machine (SAA definition), e.g. optical discs, floppy disks, USB drives, etc.
SIP: Acronym that stands for "Submission Information Package,” as established by the OAIS reference model. This refers to a set of digital archival materials donated by a donor or created through a digitization workflow, prior to its ingest into the digital preservation system (ARCHive).
Technical appraisal: In the context of UGA’s born-digital archives, this refers to appraisal based on specific technical criteria, e.g. file format.
Technical metadata: Metadata that is captured automatically by the hardware or software used to create a digital record (DPC definition). Examples include creation dates, file formats, video codecs, and information about the device or software that created the digital file.
Trusted digital repository: A preservation repository that can prove its reliability over time (SAA definition). There are a set of standards that an institution can meet in order to be certified as a TDR.
URI: Acronym for “Uniform Resource Identifier,” a standard (RFC 1630) string of characters that identifies an object or service using registered protocols and name spaces (SAA definition). URIs are used as identifiers for objects in ARCHive to ensure that each identifier is unique.
References:
Digital Curation Centre Glossary: https://www.dcc.ac.uk/about/digital-curation/glossary
OAIS Section 1.6 Definitions https://public.ccsds.org/Pubs/650x0m3.pdf
Ohio State University (OSU): https://library.osu.edu/documents/SDIWG/Digital_Preservation_Policy_Framework.pdf
Society of American Archivists Dictionary: https://dictionary.archivists.org/
University of Edinburgh (UE): https://library.ed.ac.uk/heritage-collections/collections-and-search/archives/digital-archives-and-preservation/digital-preservation-policy