In order to evaluate differences in correctness of the patches generated by the baseline (jGenProg) and DeepRepair, three judges indipendently evaluated the same random sample of 30 (15 jGenProg and 15 DeepRepair) patches to assess correctness.
For each bug in the sample, we the following data is available:
DeepRepair patch
Patch log
Modified files
Baseline patch
Patch log
Modified files
Human patch
Patch log
Modified files