This small resource library provides access to websites, documents, videos, slide decks, and other materials that art librarians may find helpful towards designing and/or maintaining a web archiving program. Need more guidance than you can find here? Have a go-to resource that should be added? Let us know by adding to our discussion forum.

Just getting started?

Art librarians considering or still in the early planning stages of their own web archiving program may benefit from the following resources, which outline the field of web archiving and their many possibles places within it.

The Cobweb: Can the Internet be Archived?

This article by historian Jill Lepore, published in the January 26, 2015 edition of The New Yorker, introduces a general audience to the concepts, technologies, and personalities behind web archiving. While short on the details of web archiving standards or technologies that art librarians need to get started, it is a great resource for sharing the scope and compelling value of the practice with stakeholders and resource allocators.

Web Archiving: An Overview

SIG moderators Sumitra Duncan and Karl-Rainer Blumenthal hosted this introductory webinar for members of the Metropolitan New York Library Council (METRO) in March 2016. It briefly covers the use cases that bring librarians to web archiving, the technology that makes it possible, and a detailed description of one established web archiving program in an art library context, that of the New York Art Resources Consortium (NYARC).

After VVORK: How (and why) we archived a contemporary art blog

In this blog post, Rhizome Artistic Director Michael Connor explains how and why his organization used web archiving to preserve access to an important venue of contemporary art conversation. The case in point makes a compelling argument for the value of art conservators and information professionals in the development of web archiving tools that preserve interactive experiences.

NDSA Surveys: Web Archiving in the United States

The National Digital Stewardship Alliance (NDSA) periodically conducts surveys of web archiving programs in the United States in order to succinctly enumerate and summarize their policies, resources, challenges, and chosen technologies, and to project the fertile ground for program and technology development in the near future. While art libraries constitute a small minority of the organizations surveyed, these documents can nonetheless provide quick and effective wayfinding to art librarians who wish to understand the state of the practice in information and cultural heritage organizations more generally.

A collaborative model for web archiving ephemeral art resources at the New York Art Resources Consortium (NYARC)

This is the pre-print edition of an article by moderators Sumitra Duncan and Karl-Rainer Blumenthal that was published by Art Libraries Journal in April 2016 (http://dx.doi.org/10.1017/alj.2016.12). It describes the comprehensive web archiving program of the New York Art Resources Consortium (NYARC) as a model for art libraries that desire to archive web materials collaboratively with their peer institutions, and provides details on the policy, management, and technology resources that such an effort entails. 

Peers and precedents

Art library and related web archiving programs

  • ARLIS/NA Artist Files SIG - A pilot project to archive the websites of contemporary art galleries with the Archive-It web archiving service.
  • Asia Art Archive - Collections of art projects, exhibitions, and institutional websites.
  • New Museum - Collections of institutional publications, projects, and sites.
  • Webenact - Access portal to high-fidelity net art web archives created by digital conservators at Rhizome.

Peer documentation and policy guidance

Defining the boundaries, practices, and standards of one's web archiving program requires thoughtful policy-making. Art librarians can learn a great deal about what has worked in these regards from their web archiving peers both inside and outside of art information.

  • Collection development policies - Index of International Internet Preservation Consortium (IIPC) member institutions' and other programs' web archive collection development policies.
  • NYARC Documentation - Extensive documentation of each stage in the web archiving process as undertaken by the New York Art Resources Consortium (NYARC).
  • Web Archiving - Helpful FAQs and use cases, guidance on website "archivability," and collection development policy at Stanford University Libraries.
  • Web archiving at Columbia University - Brief summary information on the selection, permission, and description policies used by the Web Resources Collection Program at Columbia University.

Other professional groups

For more information of general importance to web archivists beyond the art information realm:

  • International Internet Preservation Consortium (IIPC) - Founded in 2003, the IIPC is the membership organization among libraries and archives charged with "improving the tools, standards and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage."
  • METRO Web Archiving Special Interest Group - Art librarians in the New York City area can join the METRO SIG dedicated to web archiving, which periodically meets in person and online to share educational and program resources.

Tools and services

There are many web archiving tools suited to different collecting scopes and specific stages in the web archiving process. Those of particular utility to art librarians include:

  • Archive-It - The "end-to-end" web archiving service and partnership organization of the Internet Archive; a resource for art librarians looking for a technical and organizational partner to help them through the entire web archiving process.
  • Heritrix - The widely-adopted web crawler project upon which many web archiving curator tools and programs are based; for art librarians comfortable in command line processing who wish to acquire materials from the web "in-house."
  • OpenWayback - The open source technology for in-browser web archive replay, used by organizations that provide local access to acquired web archives; a tool for art librarians who acquire their WARC files with a tool like Heritrix and wish to provide browser-based access to those resources online.
  • Save Page Now - A tool for quickly (and freely) adding one web page at a time to the Internet Archive's web-spanning "Wayback Machine"; an easy solution for art librarians interested in cataloging or otherwise pointing to single-page items in an archival, rather than live, form.
  • Pywb - A Python-based successor project to the Javascript-based OpenWayback for in-browser web archive replay; used for the Rhizome-hosted Webenact repository, among others.
  • Social Feed Manager - A software developed by the George Washington University Libraries in order to collect web content specific to social media services like Twitter and Instagram; useful to technically adept art librarians who wish to collect select social media content in formats conducive to large-scale data analysis.
  • Tools and software - An index of web archive acquisition, curation, and access tools compiled by the International Internet Preservation Consortium; a helpful menu of tools that art librarians can choose from if they prefer not to use a single end-to-end web archiving service.
  • twarc - An open source tool for collecting tweets from Twitter; an option for art librarians comfortable in the command line processing interface who wish to enable customized access to and/or large scale analysis of Twitter content, including hashtag feeds.
  • WebArchivePlayer - A Pywb-based desktop application for local offline replay of WARC files; a good tool for art librarians who acquire their WARC files through Webrecorder and wish to provide only resticted, on-site access to them.
  • Webrecorder - An free-to-use web archiving platform built from open source acquisition and access technologies; enables art librarians to "record" online material in real-time rather than by using a web crawler, and to download the resultant WARC files.