2013 Track

Federated search is the approach of querying multiple search engines simultaneously, and combining their results into one coherent search engine result page. The goal of the Federated Web Search (FedWeb) track is to evaluate approaches to federated search at very large scale in a realistic setting, by combining the search results of existing web search engines. This year the track focuses on resource selection (selecting the search engines that should be queried), and results merging (combining the results into a single ranked list). You may submit up to 3 runs for each task. All runs will be judged. Upon submission, you will be asked for each run to indicate whether you used result snippets and/or pages, and whether any external data was used.

2013 Track results

The overview paper can be downloaded here.

The following groups presented their work at the 2013 workshop.
Additional publications can be found at the TREC website.

2013 Track guidelines

Important dates

May 31, 2013  Sample data release and release of queries  Available now
August 11, 2013 Resource selection runs due + Release query results
September 15, 2013 Results merging runs due
September 30, 2013
 Relevance assessments and individual results released 
November 19-22, 2013 TREC 2013 

Task 1: Resource selection

For a given query, the system should return a ranking such that the most appropriate search engines are ranked highest. Systems cannot base their resource selection on the actual results for the given query. 

Submission format:

    7001 Q0 FW13-e200 1 29.34 example13a
    7001 Q0 FW13-e001 2 21.67 example13a
    7001 Q0 FW13-e007 3 19.97 example13a
    7001 Q0 FW13-e041 4 19.21 example13a

etc. where:
  • the first column is the topic number;
  • the second column is currently unused and should always be "Q0";
  • the third column is the official engine identifier of the selected search engine;
  • the fourth column is the rank the document is retrieved,
  • the fifth column shows the score (integer or floating point) that generated the ranking. This score must be in descending (non-increasing) order. The evaluation program ranks documents from these scores, not from your ranks. If you want the precise ranking you submit to be evaluated, the scores must reflect that ranking;
  • the sixth column is called the "run tag" and should be a unique identifier for your group AND for the method used. That is, each run should have a different tag that identifies the group and the method that produced the run. Please change the tag from year to year, since often we compare across years (for graphs and such) and having the same name show up for both years is confusing. Also run tags must contain 12 or fewer letters and numbers, with no punctuation, to facilitate labeling graphs with the tags.

The ranking of search engines will be evaluated by discounted cumulative gain (nDCG, the variant introduced by Christopher Burges et al. Learning to rank using gradient descent. ICML 2005). We provide a qrels file for the participants that assigns each search engine a relevance grade for a given query. The relevance of a search engine for a given query is determined by calculating the graded precision (see Using graded relevance assessments in IR evaluation, J. Kekäläinen and K. Järvelin, JASIST 53(13), 2002) on the top 10 results. This takes the graded relevance levels of the documents in the top 10 into account, but not the ranking. The relevance levels are taken from the TREC Web track. The following weights are given to the relevance levels of documents:
  • Nav: 1
  • Key: 1
  • Hrel: 0.5
  • Rel: 0.25
  • Non relevant: 0
For example, a search engine that returns 1 Key and 2 Relevant pages in its top 10 has a graded precision of (1 + 2 * 0.25)/10 = 0.15. To have integer values for relevance grades, all values are multiplied by 100 resulting in a relevance grade of 15 for this particular search engine and query. Relevance grades are rounded to the nearest integer.

A modified version of gdeval.pl (you can download it here) will be used to calculate nDCG (maximum gain value adjusted to 100).

Task 2: Results merging

For a given query, up to 10 results of the selected search engines are provided by TREC FedWeb. The goal, then, of this task is to merge the results into a single ranked list. 

Submission format:

    7001 Q0 FW13-e001-7001-01 1 12.34 example13m
    7001 Q0 FW13-e001-7001-02 2 11.67 example13m
    7001 Q0 FW13-e001-7001-02 3 10.97 example13m

The primary evaluation metric will be normalized discounted cumulative gain (nDCG, again, the variant introduced by Christopher Burges et al. Learning to rank using gradient descent. ICML 2005). Duplicate documents (based on URL and content) further down the list are considered non-relevant when calculating this measure on the merged results. Participants who do not want to focus on the resource selection task can use the results of a baseline system provided by the organizers.

FedWeb 13 collection

In contrast to typical TREC tasks, no document collection is provided. A list of 157 search engines is made available with sampled search results from each of these engines. These sampled search results can be used to build resource descriptions to carry out resource selection. Participants are free to use any additional resource, such as WordNet, Wikipedia, etc. We ask participants not to sample the search engines themselves. Each search engine is assigned to one or more categories, such as news, travel, and video. The evaluation will use 50 to 100 topics (selected from around 200 queries initially released on May 31). Some of these will be targeted to specific search engine categories, while others will be more general. We have listed two example queries that are taken from the TREC 2010 Web track (with modified descriptions for the current task, and additional 'narrative' fields, as they will be presented to the assessors):

<topic id="1">
 <query>ct jobs</query>
 <description>Find job openings in Connecticut.</description>
 <narrative>You are trying to find a temporary job in Connecticut. Any job that does not require a domain specialist, might be interesting. Therefore your search intent is best met by lists of job openings in the area, with indication of the specific job requirements.</narrative>

<topic id="2">
 <description>Find information about human memory.</description>
 <narrative>You are preparing a high school presentation about human memory. Specialized scientific literature is probably not very helpful, but any kind of overview information or material that serves to illustrate the basic concepts might be relevant.</narrative>

For the initial topic release, only the id and query-field are given. 

Query based sampling is done to allow participants to build resource descriptions. In total, 2000 queries are sent to each search engine. For each query, the top 10 results are retrieved (snippets and pages). First, 1000 queries obtained using the Zipf method (see Nguyen et al. 2012 below) are sent to each search engine, note that these queries are the same for each search engine. Then, for each search engine an additional 1000 queries are sent based on randomly selecting terms from the current sampled documents collected from that search engine (Zipf samples and Random samples).

Search Engines
The list of included search engines can be found here. In total there are 157 search engines, covering a broad range of categories, including news, books, academic, travel etc. We have included one big general web search engine, which is based on a combination of existing web search engines.

Queries and relevance judgments
The queries (TREC topics file) and the relevance judgments (Qrels) can be found on the TREC FedWeb2013 data site

Data Format
A sample of the dataset (with one search engine included) can be downloaded here. Note that to keep the size of this file small, we deliberately did not include the documents (the FW13-sample-docs/e014 directory is empty).

Obtaining the Collection
The FedWeb 13 collection can be obtained by signing the license agreement. The initial dataset only contains the samples, the search results of the topics will be made available later.
The total size of the samples based on snippets is 952 MB (216 MB compressed), the size of the samples based on documents is 188 GB (57 GB compressed)

Training collection FedWeb 2012

A training collection with search engines, topics and relevance judgements is available for developing a system. Note that these topics were not specifically targeted at search engine categories. The dataset is described by Nguyen et al. (2012), see below.
More information on the dataset can be found here. The FedWeb 2013 dataset will have a similar format as the training collection.

About FedWeb

Track coordinators

Dolf Trieschnigg,
Feb 19, 2014, 2:38 AM
Dolf Trieschnigg,
Feb 27, 2014, 8:27 AM
Dolf Trieschnigg,
Feb 19, 2014, 2:38 AM
Dolf Trieschnigg,
Feb 19, 2014, 2:38 AM