We achieved our best results on the abstracted dataset. We wanted to report every instance where the model generated a potential assert statement, regardless if it was incorrect in predicting the assert the author of the original method wrote. Importantly, we did not consider all instances where the model failed to produce perfect results. Therefore, it is possible that the model generated meaningful assert statements that did not perfectly align with the authors generated assert statement. An empirical study on these potential asserts is left for future work.
There are two types of results, abstracted and raw results. Abstracted results contain the maps to translate from abstracted code back into real source code. The raw results contain the copy mechanism results.