Continuing the Archive
Continuing the Archive
Here is a set of guidelines and pieces of advice for the technical aspects of curating, creating, and continuing a digital archive such as this one. Anyone may use this as a template for their own playbill digital collection, although it is formatted with digital tools like ContentDM in mind.
A few key items that are required for beginning a digital archive are:
A digital scanner, such as an Epson scanner
Accompanying software for the scanner
An app or program that allows for converting file types, especially from JPEGs to PDFs
A computer or laptop that is compatible with the scanner and all of the necessary applications
An external hard drive for storage and easy transfer of data between computers
Lastly, be sure to follow the Federal Agencies Digital Guidelines Initiative page for best practices when naming and saving files, found here.
Use a digital scanner such as an EPSON scanner that can connect to the computer in use. As well, use an external hard drive for storage, as you will quickly run out of space on your computer or a thumb drive due to the size of the files.
Also establish with your organization about what types of files are required for documentation and storage. For example, the University of Akron archives requires two types of files to be made of each scan, a TIFF file and a JPEG. The TIFF files are saved for future reference within the archives, and the JPEG file is for public sharing and modification. The TIFFs, once created, are saved in their own drive to be accessed by the archives if need be, but will largely go untouched during the full digitization process.
Depending as well on what you want to achieve from the digitization process, you may take different steps. If you just desire to digitize the playbills and upload them online, then ignore the steps in red. If you want to create a separate file to store names and information on the playbills that can be text-searched, such as the Weathervane’s search page, then follow the steps in red with any necessary adaptations.
Organize the playbills by date and season. This will make keeping track of what plays have been scanned and processed far easier. To keep track of this as well, you may create a spreadsheet or other external document which lists the plays in order and includes other categories you may wish to make accessible, such as who directed the play, the actors, the tech crew, etc. In this secondary document, you may use a marking system such as ‘*’ beside the play title to signify that it has been scanned, and ‘**’ to signify it has been digitized and is awaiting upload.
Create a naming system for the files. Have this documented for ease of memorization and usage. To create a naming system, incorporate details that will make relocating the physical playbill easier. For example: playbills_box5_file2_title. Incorporate the page number as well for later organization.
Follow the scanner directions to establish settings as necessary. Generally, it is recommended for DPI to be set to 600, document settings to non-reflective document, 24-bit color. Use the measurement guides to place the playbills and make the scans as uniform as possible. This will reduce the number of rescans and the amount of time you need to reset the document window. Note: between playbills and every time there is a major change in the physical dimensions in the document being scanned, you will need to reset the document window to fit its dimensions and not crop any information out.
Scan with the correct file format (PDF, TIFF, JPEG, etc). While the scanner is running, take this time to input the data from the playbill (ideally using a second physical copy) into the spreadsheet/master document. Each scan will take several minutes so that time can be spent on other aspects of your project.
Continue the scans and keep track of the metadata that will be relevant later on, such as the original date(s) the play aired/the playbill was printed, the playhouse that performed it, the collection to which it belongs, the location of the performance, etc. Metadata will be covered more thoroughly later, but it is essential to streamlining the process to have a guide for inputting the metadata when publishing the digitized playbill, especially if using a tool like ContentDM.
Once scanning is completed, you will then use a tool such as Adobe Acrobat to convert and condense the files. If a playbill is 20 pages long, then you will have 20 individual files which can be grouped together into a singular PDF file using the ‘create multiple’ option on Adobe Acrobat’s PDF creator. Other programs which can convert files to PDFs will likely have similar options. Either way, condensing all files of a playbill into one singular file is recommended for ease of upload.
After condensing the files, you can then run an OCR scanner (again available in Adobe Acrobat, although there are programs available that are specifically OCR scanners) that will make the PDF into a searchable file. Once that process is complete, you can use a ctrl+f search function to search for text. For example, if you want to find every instance of the name John Smith, you can hit ctrl+f, type ‘John Smith’, and any results will be highlighted. This step is not vital to the digitization process unless you want for your files to be searchable. Make sure to save after doing so, as the file will not automatically save the OCR’d text.
Upload the file into the relevant digital collection and follow whatever guidelines are necessary. There are countless file sharing programs out there, from Google Drive and OneDrive to ContentDM, which will all require different steps. For a tool like ContentDM which is specifically made for sharing online files in a library or archival format, you will need to input metadata for each file.
Metadata is simply ‘data about data’, information about the object in question. Tools such as ContentDM will require the inputting of metadata before allowing an object to be uploaded. All of the data here will be used by your sharing platform’s search function to recall items that fit the criteria of the search. If you do not correctly input the date, then any search for that specific date will not result in the play which was performed on that date. As such, spelling and consistency in your format is crucial to ensure the best results in your searches.
Generally, in ContentDM, the metadata will consist of the following (and this can be used as a template for creating other metadata files if you are using a different tool):
Title: The title of the object, which can either be simply the play title (Hamlet) or a phrase (Playbill for Hamlet). For photographs, the title should be distinct from the regular playbill (Photograph of Hamlet), and be sure to be consistent in the use of photograph or photo rather than switching between them. Avoid the use of quotation marks as these can interfere with some digital search engines and programs.
Creator: The playhouse which created the playbill or that put on the play which has been photographed.
Photographer: Often not relevant, but if there are photographs and a credited photographer, this is where you would input their name.
Date original: Input this in the format of: year-month-day. If only an approximate date is known, such as only the year, then just input the year.
Description: A simple description of the item being uploaded. For example: A playbill created by Weathervane Community Playhouse in Akron, Ohio for Hamlet. It is recommended to keep this simple and with an easily repeated layout. For photographs, generally include if the picture is in color or black and white and a brief explanation of what the photo depicts. For example: A black and white photograph by Weathervane Community Playhouse in Akron, Ohio for Hamlet. Some photographs may also include information on the back, such as the names of the individuals who are pictured, and this is where you would also input that information, although this is not essential.
Subject terms: These consist of singular words, each capitalized and delimited by a semicolon. For a playbill, you may want your subject terms to be the play title, the name of the company, the word ‘playbill’ itself, and other relevant information you feel someone may use to look up this item.
Location: Not essential, especially for items all from the same playhouse, but you can be as specific or broad as you feel is necessary for people to find the playbills.
Type: For playbills, the type will always be Text. For photographs, the type will depend on what format it is, but generally Image or Photograph will be available options that will work fine, so long as it is kept consistent.
Publisher: Not essential, but if there is a credited publisher or publishing company, you may credit them here.
Digital Publisher: The name of the company or specific branch which is allowing for the digital publication of this material.
Date digitized: Again in a year-month-day format.
Copyright statement: This is a field where you can copy-paste whatever copyright statement the organization you are creating the archive for has created.
Source Collection: You may wish to put items under a singular collection which will serve as essentially a larger file where everything else will be stored. In which case, you would input that collection here.
Identifier: The name of the file
Medium: Whatever medium the item is, such as a playbill, a letter, a photograph, etc
Size: Not essential, but you may include the measurements of the physical item
Metadata Creator/Item uploader: The name of the person uploading the items
Website: A link to where someone would find the full collection.
Contact information: Any information someone might use to contact an individual or the organization in question if they have questions or concerns about the material.
When creating metadata, it is essential to have a master copy of the key terms you use so anyone can upload material and there will be no marked difference in how they input the information. If there are discrepancies, it can cause objects to not be found in relevant searches or for files to essentially become invisible and in need of editing. You may have a separate document which you keep on hand with each category and what terms to use and in what order.
One of the aspects of the original Weathervane playbill project was the creation of a master spreadsheet where metadata and personal information could be easily found. A spreadsheet may seem intimidating but it is straightforward and useful for organizing a ton of information which fits into repeated categories and can be used to later create a more streamlined search page. The categories used in the original project, and which can be used as a template for later additions, are:
Title of the play
Date of original opening show
Writer(s)
Director(s)
Actors
Set Designer(s)
Tech Crew
Musicians
Acknowledgements
These categories can be changed or added to as needed, but otherwise encompasses every name that would appear in a playbill. This way someone can do a simple search function of a person’s name and receive every instance they appeared in a play rather than manually searching each individual playbill.
When inputting the information into the spreadsheet, you must establish rules about naming conventions. Especially in older documents, names are often misspelled, so it will have to be decided if you are to keep the original incorrect spellings or input the correct spelling (assuming the correct spelling is known). In the original project, * is placed next to names that seem to have inconsistent spelling across playbills. Generally, it is archival standard to keep the misspelling and not alter the information in the document.
As well, when inputting names, do not use a comma to separate them. Instead, use a semicolon as the delimiter. This will avoid any confusion with names that have ‘, Jr.’ on the end and will allow for easier usage in other data programs which do not accept a comma as a delimiter.
If the playbills/documents in question have high quality print and have OCR’d well, you may wish to copy-paste names from the finished PDF into the spreadsheet. However, this process is slower and may result in typos and formatting errors, so you may wish to type them yourself if you are confident in your ability to type consistently and without errors.