Web Collection: The Frick Collection
Seed URLS for monthly crawl:
Quality Assurance (QA) is a web archiving term for checking the quality of captured material. While undergoing QA, an archived site is checked for missing documents, broken links, un-captured material, and style conformance. A captured site should appear and function as closely as possible to the live version of the website.
There are three modes of viewing a website in the web archiving workflow at the FARL. Using three browsers, one compares the live site that has been captured, the archived site as it appears using the Wayback Machine, and the proxy version. The proxy site is the representation of the WARC files directly from the NYARC collection of captured documents. While the Wayback view of an archived site may include material sourced from the live web, the Proxy view is required to confirm that what you see is what you have. Much of my first semester was spent reading & viewing requisite Archive-It documentation and familiarizing myself with web archives, historically. I began learning the QA process by working on smaller sites from the New York City Galleries and Artists’ Websites collections. I subsequently began work on The Frick Collection, which would continue through the Spring Semester. The NYARC museum’s websites (frick.org, BrooklynMuseum.org, and MoMA.org) are the largest sites requiring QA, with many nested pages, and embedded media from sources like Youtube or Vimeo, images, and technically complex virtual museum tours.
I kept track of my work was by using google sheets, fastidiously recording the details of the QA process, including links requiring patching, links to patch crawls, dates of crawls, descriptive notes on condition of the archived site, and notes about contacting Archive-It for support.
When something was amiss in the monthly capture of a seed in The Frick Collection, it was my job to run a patch crawl, using targeted URLs from the live site to replace missing documents from the Archive-It capture.