Aims
The aims of the challenge were to encourage researchers and practitioners to build and demonstrate information access systems satisfying at least one of the following:
There was no need to build a new system for the challenge. Participants may also report on the use of existing systems. All entries must use the challenge data set. Interface sketches, mockups, wireframes, etc. are not permitted.
Data
The data set used was the New York Times (NYT) Annotated Corpus. The corpus is a collection of over 1.8 million articles annotated with rich metadata published by The NYT between January 1, 1987 and July 19, 2007. Use of the NYT corpus for the HCIR challenge was appealing for several reasons:
The focus of the challenge was on the development or use of interactive techniques not on data wrangling. As such, we indexed the collection and provided a baseline retrieval system. The NYT corpus was available to challenge participants free of charge thanks to the generosity of the Linguistic Data Consortium (LDC). We are very grateful to the LDC for covering the cost of shipping The NYT corpus to challenge participants.
Baseline
A baseline search system for The NYT corpus can be built using Solr. Solr scripts for building a searchable index of The NYT corpus are available here.
Task Scenarios
A pilot evaluation of the system was optional. Participants were requested to consider some or all task scenarios from a set of historical exploration tasks based on the NYT corpus:
Challenge Reports
Each participant in the HCIR challenge submitted a four-page challenge report describing their work. All accepted challenge papers were included in the proceedings. At the workshop, participants presented their systems so that attendees could evaluate them based on the following HCIR evaluation criteria:
Participants
Search for Journalists: New York Times Challenge Report
Corrado Boscarino, Arjen P. de Vries, and Wouter Alink (Centrum Wiskunde and Informatica)
Exploring the New York Times Corpus with NewsClub
Christian Kohlschütter (Leibniz Universität Hannover)
Searching Through Time in the New York Times
Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)
News Sync: Three Reasons to Visualize News Better
V.G. Vinod Vydiswaran (University of Illinois), Jeroen van den Eijkhof (University of Washington), Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research)
Custom Dimensions for Text Corpus Navigation
Vladimir Zelevinsky (Endeca Technologies)
A Retrieval System Based on Sentiment Analysis
Wei Zheng and Hui Fang (University of Delaware)
Winner
The winner of the HCIR 2010 Challenge was Searching Through Time in the New York Times, by Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs).
Acknowledgements
The organizers greatly appreciate the help of Evan Sandhaus and Tommy Chheng in volunteering their efforts to make the data available to HCIR Challenge participants.