Verifying a file’s integrity means ensuring that the file is unchanged, down to the bit level. There are tools that can verify integrity using checksums, which are algorithmically-generated strings of characters that serve as a kind of “fingerprint” for a digital file – if something changes within that file, the file’s checksum will change completely.
As a baseline preservation practice, it’s a good idea to verify a file’s integrity when copying or moving preservation versions of digital files. Using a copying tool with verification can ensure that nothing in the file has changed or let you know if you need to re-copy something that transferred incorrectly.
Fixity: the property of being unchanged (SAA Dictionary of Archives Terminology)
Checksum: a unique alphanumeric value that represents the bitstream of an individual computer file or set of files (SAA Dictionary of Archives Terminology)
Hash algorithm/function: A function that converts a data string into a unique numeric string output of fixed length. Two of the most common hash algorithms are the MD5 (Message-Digest algorithm 5) and the SHA-1 (Secure Hash Algorithm) (adapted from FADGI Glossary)
Integrity: Internal consistency or lack of corruption of digital objects, which can be compromised by hardware errors even when digital objects are not touched, or by software or human errors when they are transferred or processed (Core Trust Seal Glossary)
Verification: The process of checking a copy of a data file to make sure that it is exactly equal to the original data file, or that a file remains unchanged over time (NDSA Glossary)
TeraCopy (Mac, Windows): TeraCopy is a piece of free software created by Code Sector that runs in a GUI (graphical user interface) or on the command line. It allows users to quickly move or copy large numbers of files, pause and restart transfers as needed, and verify the integrity of copied files. The “verification” setting in TeraCopy generates a hash for the original file, copies or transfers that file, then generates a second hash and compares the two.
Rsync (Mac, Windows, Linux): Rsync is an open-source file copying tool that is run on the command line. It is primarily designed to sync versions of remote files. But it also verifies file transfers by generating a checksum for each copied file and comparing it to the source file’s checksum.