Web Archiving‎ > ‎

2. Services Used for Website Harvesting and Playback

A). Archive-It:

The Archive-It service is primarily used for harvesting websites selected for inclusion in the NYARC collections. Archive-It is a subscription-based web archiving service of the Internet Archive and is used by organizations for harvesting and managing born-digital content. Archive-It currently works with over 400 partner organizations, inclusive of university libraries, state archives and libraries, museum and art libraries, historical societies, and public libraries.


B). Wayback Machine:

The Internet Archive’s Wayback Machine allows users to view archived websites as they existed on the live web over time. The Internet Archive began archiving cached pages of websites in 1996 and they continue to crawl the web at regular intervals. As of late 2015, the Wayback Machine has archived over 439 billion webpages from the Internet.


C). Hanzo Archives:

HanzoArchives, a commercial web harvesting service, is used by NYARC for the capture of dynamic content that we have found to be difficult to capture ourselves via crawls initiated with the Archive-It tool. Thus far NYARC has primarily utilized Hanzo’s service for the capture of MoMA exhibition websites that heavily rely on the use of formats such as JavaScript and Flash. We have additionally had Hanzo capture several artists’ websites and exhibition catalogs published online by New York City galleries. As part of the two-year Andrew W. Mellon Foundation grant, “Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art Resources,” we worked with the two vendors to integrate WARC files from Hanzo into NYARC’s Archive-It collections. The majority of the sites which have been successfully ingested into the Archive-It pipeline render properly in playback alongside the other sites within the collections. Upon ingest and integration, several sites captured with Hanzo Archive’s service have failed to render in playback due to the heavy use of Flash files on the websites. 
Comments