Web Archive Search as Research

Post date: Sep 17, 2013 12:26:8 PM

Web Archive Search as Research: Methods for Studying Searchable Web Archives

A day of Web Archive analysis and critique by the WebART project and the Digital Methods Initiative, New Media & Digital Culture, University of Amsterdam

September 11, 2013 (9:30-16:00)

The field of Web Archiving is at a turning point. In the early years of Web Archiving, the single URL, has been the dominant unit for preservation and access. Access tools such as the Internet Archive’s Wayback machine, reflect this notion as they allowed consultation, or browsing, of one URL at a time. The single URL access point to Web archives has also constructed early Web archive research methods. 

In recent years, however, the single URL approach to Web archiving is being gradually replaced by a big data approach. Several web archives and research initiatives are already engaged in developing advanced search interfaces, access to aggregated metadata, visualisation tools, annotation and enrichment features of future web archives.


On September 11 2013 [1], researchers from the Digital Methods Initiative and from the WebART project at the University of Amsterdam convened to discuss the theoretical and methodological implications of searching, mining and visualizing the archived Web.

The first part of the day included several talks that addressed web archive research from various perspectives.

Anat Ben-David (WebART) discussed the past and future of Web archive research and “search as research” methods for social research of archived web data.

Hugo Huurdeman (WebART) introduced WebARTist as a “(Re)Search Engine” and the idea of building a web archive search system that supports scholarly research as a “tool-maker’s tool”

Tjarda de Haan (Amsterdam Museum) described the preservation project of the Amsterdam Digital City: “Re:DDS”

And Jules Mataly (MA graduate, University of Amsterdam), presented his MA thesis “The Three Truths of Margaret Thatcher: Creating and Analysing Archival Artefacts” as an example of a cross-archival search and collection critique.

The second part of the day included a hands-on session with WebARTist, the Web Archives Temporal Information Search System, which the WebART project developed during its pilot year (2012-2013) to offer new possibilities for exploration, extraction, analysis and visualization of archived web data.

The day concluded with an evaluation of WebARTist and recommendations for its future development to fit researchers’ needs. The general response to WebARTist was positive, and researchers indicated that the system provided new ways to explore web archives, substantially augmenting the existing “Wayback Machine” interface to the Dutch Web archive. WebARTist “supports the shift to studying web archives through queries”, as one participant noted. Another participant indicated that the system allows one to “look at ‘data’ rather than single sites”, and that it allows one to be “reflexive about collection policies”. The system could also advance the types of research questions that can be answered using Web archives: “It made it possible to build new research questions that go beyond the web site history approach. It also offers hope that web archiving is evolving in a more creative field of research.”

[1] The event was held on September 11 as a tribute to the September 11 Web Archive collection, curated by Kirsten Foot and Steve Schneider, in collaboration with the Internet Archive and the Library of Congress. The September 11 Web Archive has pioneered social research of archived Web collections.