A multimodal IR test collection with scientific (physics) documents, citations, topics and relevance assessments
iSearch is an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, and describes and analyses a new test collection constructed for this purpose. The test collection consists of approx. 18,000 monographic records, 160,000 papers and journal articles in PDF and 275,000 abstracts with a varied set of metadata and vocabularies from the physics domain, 65 topics based on real work tasks and corresponding graded relevance assessments. The test collection may be used for systems as well as user-oriented evaluation.
The data set was crawled from the arXiv and from the Danish National Library. The collection also contains approx. 3.5 million citations to complement the document data.
For further details please refer to the following publication:
Lykke, M., Larsen, B., Lund, H. & Ingwersen, P. (2010). Developing a test collection for the evaluation of integrated search. ECIR, p. 627-630. [PDF]
iSearch is available for free for academic research purposes - send an email to isearch.dataset@gmail.com to learn more.
Please give recognition to the creators of iSearch in any publications using the test collection by citing:
Lykke, M., Larsen, B., Lund, H. & Ingwersen, P. (2010). Developing a test collection for the evaluation of integrated search. ECIR, p. 627-630. [PDF]
Thanks to Philipp Schaer for setting up the iSearch github page, and to Suzan Verbenne for recovering the collection after two server crashes.
See also a list of papers that cite the collection according to Google Scholar.
archives The original files from the distributor
iSearch-direct-citations.tgz Direct citations within the dataset
iSearchIDs.tgz Mapping of internal IDs and arXiv IDs / URLs
iSearch-references.tgz Extracted reference information
iSearch-v1.0_documents-PF-pdf.tgz PDF fulltext for a subset of the documents
iSearch-v1.0_documents.tgz Document metadata
iSearch-v1.0_topics+assessments.tgz Topics and relevance assessments
scripts Sample scripts to work with the data
Contact isearch.dataset@gmail.com to get more information about the collection and its availability.