HCIR Challenge

INTRODUCTION

Information retrieval (IR) research today often emphasizes precision at the expense of recall. Precision is defined as the fraction of retrieved documents that are relevant, while recall is defined as the fraction of relevant documents that are retrieved.

These measures were originally intended for set retrieval, but most current research assumes a ranked retrieval model, in which the search returns results in order of their estimated likelihood of relevance to a search query. Popular measures like average precision mostly reflect precision for the highest-ranked results.

For some of the most difficult and valuable information-seeking problems, however, recall is at least as important as precision. In particular, for tasks that involve exploration or progressive elaboration of the user’s needs, a user’s progress depends on understanding the breadth and organization of available content related to those needs. Techniques designed for interactive retrieval, particularly those that support iterative query refinement, rely on communicating to the user the properties of large sets of documents and thus benefit from a retrieval approach with a high degree of recall.

The HCIR 2011 Challenge focuses on the case where recall is everything – namely, the problem of information availability. The information availability problem arises when the seeker faces uncertainty as to whether the information of interest is available at all. Instances of this problem include some of the highest-value information tasks, such as those facing national security and legal/patent professionals, who might spend hours or days searching to determine whether the desired information exists.

THE CiteSeer CORPUS

The corpus we used for the HCIR 2011 Challenge is the CiteSeer digital library of scientific literature. The CiteSeer corpus contains over 750,000 documents and provides rich meta-data about documents, authors, and citations. More information about CiteSeer can be found on the CiteSeer website and in “CiteSeer: An Automatic Citation Indexing System” by C. Lee Giles, Kurt D. Bollacker, and Steve Lawrence.

The snapshot of the corpus is roughly 45G.

CHALLENGE TASKS

We offered the following example task to give participants an idea of what we expect users to be able to do with their systems:

Latent Semantic Indexing (LSI) is an indexing and retrieval method that uses a mathematical technique called Singular Value Decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. Deerwester et al. published seminal papers on LSI in 1988. Is there earlier work that anticipates some or part of this approach?

Two weeks before the workshop, we offered the following tasks to participants:

1) The Great MapReduce Debate

In 2004, Google introduced MapReduce as a software framework to support distributed computing on large data sets on clusters of computers. In the "Map" step, the master node takes the input, partitions it up into smaller sub-problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node. In the "Reduce" step, the master node then takes the answers to all the sub-problems and combines them to obtain the final output.

In a blog post, David J. DeWitt and Michael Stonebraker asserted that MapReduce was not novel -- that the techniques employed by MapReduce are more than 20 years old. Use your system to either support DeWitt and Stonebraker's case or to argue that a thorough search of the literature does not yield examples that support their case.

2) Faceted Search in the 80s?

In 1990, Nicholas Belkin and Pier Giorgio Marchetti wrote a SIGIR conference paper that proposed and specified a faceted search interface for an information seeking system. Were they the first to do so? Was there any work on faceted search in the 1980s?

Use your system to unearth such work, or to make the case that faceted search was not developed until the 1990s.

3) Graph Drawing on a Sphere

A drawing of a graph or network is a visual representation in which the vertices are represented as disks or boxes, and the edges are represented as line segments or curves. Conventionally, graphs are drawn in the Euclidean plane.

Are there graph layout algorithms for drawing graphs on the surface of a sphere? Use your system to find out, or to find the closest alternatives.

4) People Who Like This Person Also Like…

Collaborative filtering is a popular method of predicting a user's interests by collecting preferences or taste information of a system's other users. The most famous example of collaborative filtering is the recommendation system used by Amazon.com, instantiated as "people who bought this item also bought…"

Has anyone applied collaborative filtering to people search? Use your system to find out if anyone has done so and written about it.

5) HCIR Meta Challenge

The HCIR workshops have been held annually since 2007. The HCIR Wikipedia entry lists related workshops that precede it, namely:

- 2006 SIGIR Workshop on Faceted Search

- 2005 ESF Exploratory Workshop on Information Retrieval in Context

- 1998 Workshop on Information Retrieval and Human Computer Interaction

- 1998 Workshop on Innovation and evaluation in information exploration interfaces

Are any relevant workshops or conferences missing from this list? Use your system to find out, or to argue convincingly that this list is complete.

The goal of the challenge was not to test participants’ domain knowledge, nor did we expect participants to make new discoveries of obscure materials – though we would welcome such discoveries! Rather, we wanted to see how a system supports a user whose information-seeking task is an information availability problem. We will provide participants with a small set of example tasks to guide development of their systems.

Given the nature of the corpus and the composition of the HCIR community, we focused on tasks from computer science and information science.

Ideally, three criteria could be used to evaluate a solution to an information availability problem:

  • Correctness of the outcome. Does the user correctly conclude whether the information of interest is available?
  • Efficiency. How much time or effort does the user invest, regardless of outcome?
  • User confidence in the outcome. Does the user believe the result, particularly if it is negative?

Of course, these criteria are affected by a user’s domain knowledge; mitigating user effects requires extensive user studies. We recognize that an ideal evaluation would be impractical for most challenge participants.

EVALUATION

We evaluated submissions as follows:

  1. Participants froze their systems and submitted brief descriptions about a month before the workshop.
  2. When we received the description from a participant, we provided the participant the set of evaluation tasks. Participants had two weeks to submit videos or screenshots demonstrating their systems on the evaluation tasks. Participants used their systems to perform the evaluation tasks in good faith. They could use their own domain knowledge, but were instructed to not use expertise from outside subject matter experts or other systems.
  3. At the workshop, we gave each participant 15 minutes to demonstrate their system live to workshop attendees by performing two tasks of their choosing.
  4. The attending participants voted to determine the system that, in their view, best supports the user’s ability to resolve an information availability problem.

The following four teams competed in the HCIR Challenge:

The winner was Session-based search with Querium, presented by Gene Golovchinsky.

RESULTS