HCIR Challenge

Aims

The aims of the challenge were to encourage researchers and practitioners to build and demonstrate information access systems satisfying at least one of the following:

    • Not only deliver relevant documents, but provide facilities for making meaning with those documents.
    • Increase user responsibility as well as control; that is, the systems require and reward human effort.
    • Offer the flexibility to adapt to user knowledge / sophistication / information need.
    • Are engaging and fun to use.

There was no need to build a new system for the challenge. Participants may also report on the use of existing systems. All entries must use the challenge data set. Interface sketches, mockups, wireframes, etc. are not permitted.

Data

The data set used was the New York Times (NYT) Annotated Corpus. The corpus is a collection of over 1.8 million articles annotated with rich metadata published by The NYT between January 1, 1987 and July 19, 2007. Use of the NYT corpus for the HCIR challenge was appealing for several reasons:

    • The content is broadly accessible without any special domain expertise.
    • The annotations are rich enough to support rich interactive approaches without requiring sophisticated information extraction techniques.
    • The size of the collection is large enough to be interesting without being so large as to cause scale challenges.

The focus of the challenge was on the development or use of interactive techniques not on data wrangling. As such, we indexed the collection and provided a baseline retrieval system. The NYT corpus was available to challenge participants free of charge thanks to the generosity of the Linguistic Data Consortium (LDC). We are very grateful to the LDC for covering the cost of shipping The NYT corpus to challenge participants.

Baseline

A baseline search system for The NYT corpus can be built using Solr. Solr scripts for building a searchable index of The NYT corpus are available here.

Task Scenarios

A pilot evaluation of the system was optional. Participants were requested to consider some or all task scenarios from a set of historical exploration tasks based on the NYT corpus:

  • Learn about a topic that has a long history:
      • Draw a rough chart of how has subway crime in New York varied over the past two decades.
      • Draw a rough chart of how the price of a slice of pizza in New York varied over the past two decades.
  • Understand the competing perspectives on a controversial topic:
      • Enumerate the main arguments that have been made for and against rent control in New York.
      • Enumerate the main arguments that have been made for and against the impeachment of U.S. president Bill Clinton.
  • Answer a question that requires looking at more than one document:
      • Enumerate the major venues in New York City that offer free concerts.
      • Determine if a member of the Communist party has ever held a legislative or executive post in New York State.

Challenge Reports

Each participant in the HCIR challenge submitted a four-page challenge report describing their work. All accepted challenge papers were included in the proceedings. At the workshop, participants presented their systems so that attendees could evaluate them based on the following HCIR evaluation criteria:

    • Effectiveness: Is a user able to complete the task?
    • Efficiency: How efficiently does the user complete the task?
    • Control: To what extent does the system give the user control over the information seeking process?
    • Transparency: Does the user understand what the system is doing?
    • Guidance: How much direction does the system provide to help the user refine their search strategy or reach their search goal?
    • Fun: Is the system engaging and fun to use?

Participants

Search for Journalists: New York Times Challenge Report

Corrado Boscarino, Arjen P. de Vries, and Wouter Alink (Centrum Wiskunde and Informatica)

Exploring the New York Times Corpus with NewsClub

Christian Kohlschütter (Leibniz Universität Hannover)

Searching Through Time in the New York Times

Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)

News Sync: Three Reasons to Visualize News Better

V.G. Vinod Vydiswaran (University of Illinois), Jeroen van den Eijkhof (University of Washington), Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research)

Custom Dimensions for Text Corpus Navigation

Vladimir Zelevinsky (Endeca Technologies)

A Retrieval System Based on Sentiment Analysis

Wei Zheng and Hui Fang (University of Delaware)

Winner

The winner of the HCIR 2010 Challenge was Searching Through Time in the New York Times, by Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs).

Acknowledgements

The organizers greatly appreciate the help of Evan Sandhaus and Tommy Chheng in volunteering their efforts to make the data available to HCIR Challenge participants.