If possible, create two access versions of the files (a nearline and an online copy). The nearline copy comprises records that are a complete or nearly complete copy of the files as we received them that have been minimally processed and are not suitable for posting online. Access to the nearline version will only be granted upon patron request on a case-by-case basis. The online copy will serve as the primary access version of the records. Generally speaking, the online copy will serve as a representative sample of the records and will be of a manageable size (1 gigabyte or less is ideal, though this of course will depend upon the nature of the records, and especially is applicable for born-digital personal archives). It will consist of selected content that is free of copyright or other IP concerns, and is arranged and described in a way that is useful to researchers
Make sure you weed duplicates and remove/redact all confidential, sensitive, and proprietary files (e.g. proprietary software programs, PDF of journal articles, etc.) from the online access copy. Although Firefly/Identity Finder will help you identify files containing social security and account numbers, you may need to skim files for other sensitive information (such as confidential student information, medical records, etc.). Use the folder and file names to help you identify any personal or sensitive content, or files with IP issues. You should also weed the the nearline copy of duplicates and redact/remove social security and bank account numbers.
If you are comfortable using the command line, you can identify, delete, move, and rearrange files relatively quickly. See the instructions on using the command line to automate processing actions.
It may be helpful to quickly scan, or preview files without having to manually open them with applications. Also, certain applications will change file metadata or modify contents of file, even if you do not make any changes to the file. If you use a Mac, you can use Quick Look or the Cover Flow view option. If you have a PC, you can use Preview Pane in Windows. See the instructions on how to preview files on a Mac or PC
If the electronic content doesn’t appear to be in any order and also has an analog component as part of a hybrid collection, you may want to look at how it is arranged to give you some ideas about how to proceed. As you process the files, you may also want to refer to the Practical E-Records blog
Other tools that may be useful as you work with the materials are Duplicate Cleaner and Renamer. More information can be found here
Oftentimes file extensions may be missing, which will make it difficult to provide access to the electronic content. Without this information, the computer does not know which program to use to open the file. One utility that may be useful is TrID. TrID can append file extensions and/or be used to analyze file types (Note: only append file extensions to access copy files). See this page for instructions on how to use TrID
Some files may require migration to another format in order to be made accessible. Decisions to migrate to content should be made on a case by case basis, or per researcher request. Consult with Tracy if you would like assistance migrating files.
Note: If derivatives, migrated copies have been generated, derivatives will be given the suffix “~d3riv” before extension and stored in the same folder as the original.
All migrations will be documented in a processing note on the archon digital object record, at the most specificity possible.
Example: Filename_asdf.tif --> Filename_asdf~d3riv.jpg