Many software defect prediction approaches have been proposed and most are effective in within-project prediction settings. However, for new projects or projects with limited training data, it is desirable to learn a prediction model by using sufficient training data from existing source projects and then apply the model to some target projects (cross-project defect prediction). Unfortunately, the performance of cross-project defect prediction is generally poor, largely because of the feature distribution differences between the source and target projects.
In this paper, we apply a state-of-the-art transfer learning approach, TCA, to make feature distributions in source and target projects similar, and we propose a novel transfer defect learning approach, TCA+, by extending TCA. Our experimental results for eight open-source projects show that TCA+ significantly improves cross-project prediction performance.
- Jaechang Nam, Sinno Jialin Pan and Sunghun Kim, Transfer Defect Learning,in Proceedings of 35th International Conference on Software Engineering (ICSE 2013), San Francisco, May 18-26, 2013. [PDF]
Data Sets for Experiments
We provide data sets (arff files for Weka) to reproduce our experimental results. (Also, you can download all datasets in a zip file. datasets.zip)
To reproduce experimental results with the above data sets, please, follow the steps below:
- Subjects (Please, click your mouse-right button on each link and save it.
- Relink Original subject data sets
- AEEEM Original subject data sets
- Relink Transformed data sets by TCA with a specific normalization option (Label: 1 = buggy, -1 = clean)
- AEEEM Transformed data sets by TCA with a specific normalization option (Label: 1 = buggy, -1 = clean)
- Similarity Vectors
Values for vectors mean as follows:
3: MUCH MORE
1: SLIGHTLY MORE
-1: SLIGHTLY LESS
-3: MUCH LESS
- Download a Weka 3.6.1 archive file (download) and get "weka.jar" in the zip file
- Download a LIBLINEAR wrapper class for Weka, here.
- Different Experimental settings
- Within-project defect prediction
- Logistic Regression
- java -Xmx1024m -cp weka.jar:[path_to_liblinear]weka.classifiers.functions.LibLINEAR -S 0 -C 1 -B -1put_file_name.arff -x 2 -i -o -s 1
- For Windows cmd user, please use ';' for class path seperator
- Cross-project defect prediction with/witout transfer learning
* For other options, please refer http://weka.wikispaces.com/Primer
- Run a command with a specific machine learning algorithm and two of arff files (i.e., source for training, target for test) above:
- java -Xmx1024m -cp weka.jar:[path_to_liblinear]weka.classifiers.functions.LibLINEAR -S 0 -C 1 -B -1 -tput_source_subject_file_name.arff -Tput_target_subject_file_name.arff -i
- Results from the command running display many infomation Weka provides. The value we are interesting is F-measure of "buggy" Class Please, see F-measure value of "buggy" Class under "=== Stratified cross-validation ===" line.
- Jaechang Nam (jcnam [at] cse dot ust dot hk)
- Sinno Jialin Pan (jspan [at] i2r dot a-star dot edu dot sg)
- Sunghun Kim (hunkim [at] cse dot ust dot hk)
If you have any comments/questions regarding the research work, please feel free to contact any of the project members.
Last updated on 06/04/2013 13:12:08GMT.