Manual Assessment

In order to evaluate differences in correctness of the patches generated by the baseline (jGenProg) and DeepRepair, three judges indipendently evaluated the same random sample of 30 (15 jGenProg and 15 DeepRepair) patches to assess correctness.