Fig.13. c6 is from the test set of the JCSD with sample number 707. c7 is from the test set of the PCSD with sample number 18275.
The two real-world examples c6 and c7 shown in Fig. 13. are also from the test sets of JCSD and PCSD datasets. Compared to c4 and c5 in Fig.12 of the paper, the code snippets of c6 and c7 are longer and more complex. For c6, similarly, we can simply divide its reference summary into three parts: "apply ... to" (Blue font), "graphic attributes" (Red font), and "the symbol" (Orange font). From this example, it can be observed, compared with the reference summary,
1) the summary generated by SiT is completely wrong;
2) CodeBERT, UniXcoder, and ESALE correctly cover the first part ("applies" is semantically identical to "apply".);
3) although CodeBERT and UniXcoder can generate informative text (e.g., "symbolattributes", "graphic attributes", and "graphicattributes"), both of them completely reverse the order of the second and third parts;
4) ESALE correctly covers the partial semantics of the second part ("the attributes") and the full semantics of the third part ("the symbol"). Compared with CodeBERT and UniXcoder, although ESALE fails to generate the word "graphic", it correctly predicts the order of the second and third parts. We attribute the better performance of ESALE to better capturing the alignment between code and summaries.
Analogously, we can simply divide the reference summary of c7 into three parts: "this function" (Blue font), "checks" (Red font), and "if the url parameter is dynamic" (Orange font). From this example, it can be observed, compared with the reference summary,
1) only UniXcoder and ESALE successfully cover the semantics of the first part, and the text generated by ESALE (i.e., "this function") is consistent with the reference;
2) all four techniques can cover the second part (i.e., "checks");
3) all four techniques are insufficient for the generation of the third part (e.g., all of them fail to generate the keyword "url");
4) compared with the three baselines, the summary generated by ESALE is closer to the reference summary. Based on the above two cases, it can be concluded that summarizing long and complex code snippets is still challenging for baselines and ESALE, and there is room for further improvement.
Fig.14. c8 is from the test set of the JCSD with sample number 3842. c9 is from the test set of the PCSD with sample number 1631.
The two real-world examples c8 and c9 shown in Fig. 14 are also from the test sets of JCSD and PCSD datasets. For c8, we can simply divide its reference summary into three parts: "filter" (Blue font), "children" (Red font), and "by name and class" (Orange font). Compared with the reference summary,
1) the summary generated by SiT is completely wrong;
2) CodeBERT and ESALE+CodeBERT cover partial semantics of the second and third parts;
3) UniXcoder and ESALE+UniXcoder correctly cover the first part;
4) only ESALE+UniXcoder successfully covers the full semantics of the third part. In addition, through further careful comparison, we can find that ESALE will inherit some deficiencies of the pre-trained models when building on them. For example, ESALE+CodeBERT and CodeBERT make the same mistakes, such as generating the wrong first part ("get") and missing some semantics in the third part ("class"). Analogously, ESALE+UniXcoder and UniXcoder make the same mistake, generating the wrong second part ("siblings"). Meanwhile, we also find the same phenomenon in the Python language. For example, from Fig. 14(c) and (d), it is observed that, given the Python code snippet c9, the summaries generated by ESALE+CodeBERT and CodeBERT are missing the key information "a cluster admin", while the summaries generated by UniXcoder and ESALE+UniXcoder are missing the key information "user". It is worth mentioning that our ESALE has a significant improvement on CodeBERT and UniXcoder. For example, for the case of c8, compared with UniXcoder, ESALE successfully generates the key semantic information "and class". For c8 and c9, the summaries generated by ESALE are textually closer to the reference summaries than CodeBERT and UniXcoder.