Overview

This site hosts supplementary material to two publications on IR-based trace recovery:

Recovering from a Decade: A Systematic Mapping of Information Retrieval Approaches to Software Traceability

Markus Borg, Per Runeson, and Anders Ardö
Empirical Software Engineering, 2013, Springer Link, (draft version)

Abstract: Engineers in large-scale software development have to manage large amounts of information, spread across many artifacts. Several researchers have proposed expressing retrieval of trace links among artifacts, i.e. trace recovery, as an Information Retrieval (IR) problem. The objective of this study is to produce a map of work on IR-based trace recovery, with a particular focus on previous evaluations and strength of evidence. We conducted a systematic mapping of IR-based trace recovery. Of the 79 publications classified, a majority applied algebraic IR models. While a set of studies on students indicate that IR-based trace recovery tools support certain work tasks, most previous studies do not go beyond reporting precision and recall of candidate trace links from evaluations using datasets containing less than 500 artifacts. Our review identified a need of industrial case studies. Furthermore, we conclude that the overall quality of reporting should be improved regarding both context and tool details, measures reported, and use of IR terminology. Finally, based on our empirical findings, we present suggestions on how to advance research on IR-based trace recovery.


IR in Software Traceability: From a Bird's Eye View


Markus Borg, and Per Runeson
In Proceedings of the
7th International Symposium on Empirical Software Engineering and Measurement, Baltimore, Maryland, USA, pp. 243-247, October 2013, IEEE Xplore, (draft version)

Abstract: Several researchers have proposed creating after-the-fact structure among software artifacts using trace recovery based on Information Retrieval (IR) approaches. Due to significant variation points in previous studies, results from previous work are not easily aggregated. We provide an initial overview picture of the outcome of previous evaluations. Based on a systematic mapping study, we perform a synthesis of previous work. Our results show that there are no empirical evidence that any IR model outperforms another model consistently. We also display a strong dependency between the P-R values and the input datasets. Finally, our mapping of Precision and Recall (P-R) values on the possible output space highlights the difficulty of recovering accurate trace links using na\"ive cut-off strategies. Thus, our work presents empirical evidence that confirms several previous claims on IR-based trace recovery and stresses the needs for empirical evaluations beyond the basic P-R "race".


This is the Overview section. Remaining site sections contain the following:
  • Summary of the primary studies: A summary of the results from the two secondary studies.
  • Primary publications, BibTeX: A BibTeX list of the primary publications, including all abstracts.
  • Extracted Data: A spreadsheet of information extracted from the 79 primary publications.

For details on the research methodology applied, e.g., search strategies and the validation of extracted data, we refer to the journal publication. Do you think any information in the spreadsheet on Google Drive should be updated? Contact Markus Borg at the Software Engineering Research Group, Lund University to get authority to edit the document directly!