Many software defect prediction approaches have been proposed and most are effective in within-project prediction settings. However, for new projects or projects with limited training data, it is desirable to learn a prediction model by using sufficient training data from existing source projects and then apply the model to some target projects (cross-project defect prediction). Unfortunately, the performance of cross-project defect prediction is generally poor, largely because of the feature distribution differences between the source and target projects. 
In this paper, we apply a state-of-the-art transfer learning approach, TCA, to make feature distributions in source and target projects similar, and we propose a novel transfer defect learning approach, TCA+, by extending TCA. Our experimental results for eight open-source projects show that TCA+ significantly improves cross-project prediction performance.


  • Jaechang Nam, Sinno Jialin Pan and Sunghun Kim, Transfer Defect Learning,in Proceedings of 35th International Conference on Software Engineering (ICSE 2013), San Francisco, May 18-26, 2013. [PDF]

Data Sets for Experiments

We provide data sets (arff files for Weka) to reproduce our experimental results. (Also, you can download all datasets in a zip file. datasets.zip) To reproduce experimental results with the above data sets, please, follow the steps below:
  • Download a Weka 3.6.1 archive file (download) and get "weka.jar" in the zip file
  • Download a LIBLINEAR wrapper class for Weka, here.
  • Different Experimental settings
    • Within-project defect prediction
      • Logistic Regression
        • java -Xmx1024m -cp weka.jar:[path_to_liblinear]weka.classifiers.functions.LibLINEAR -S 0 -C 1 -B -1put_file_name.arff -x 2 -i -o -s 1
        • For Windows cmd user, please use ';' for class path seperator
    • Cross-project defect prediction with/witout transfer learning
      • Run a command with a specific machine learning algorithm and two of arff files (i.e., source for training, target for test) above:
        • java -Xmx1024m -cp weka.jar:[path_to_liblinear]weka.classifiers.functions.LibLINEAR -S 0 -C 1 -B -1 -tput_source_subject_file_name.arff -Tput_target_subject_file_name.arff -i
      * For other options, please refer http://weka.wikispaces.com/Primer
  • Results from the command running display many infomation Weka provides. The value we are interesting is F-measure of "buggy" Class Please, see F-measure value of "buggy" Class under "=== Stratified cross-validation ===" line.

Project Members

  • Jaechang Nam (jcnam [at] cse dot ust dot hk)
  • Sinno Jialin Pan (jspan [at] i2r dot a-star dot edu dot sg)
  • Sunghun Kim (hunkim [at] cse dot ust dot hk)
If you have any comments/questions regarding the research work, please feel free to contact any of the project members.
Last updated on 06/04/2013 13:12:08GMT.
Subpages (1): files