Fig. The framework of our intention-based code refinement.
Based on the intentions in different kinds of code reviews, we design the corresponding methods to extract intention and generate code revision. As illustrated in above figure, we employ three agents to distinguish among Explicit Code Suggestions, Reversion Suggestions, and General Suggestions.
Initially, in Agent 1, we utilize a rule-based method to identify the most explicit category: Explicit Code Suggestions, which shows the direct and explict intention. This category includes cases where the review comment contains suggestion code snippets in the format ```suggestion <code>```. Accompanying these suggestions, there may be explanatory remarks and other exchanges between the reviewer and developer. We use regular expressions to determine if a case contains a suggestion code. If it does, it is categorized as Explicit Code Suggestions; Otherwise, it proceeds to the next classification stage.
For cases that are not classified as Explicit Code Suggestions, we employ Agent 2 to further distinguish between Reversion Suggestions and General Suggestions. This differentiation is based on observations of the LastCodeDiffHunk, which reflects the modifications made from the Pre-Modification Code to the Post-Modification Code (which also serves as the Original Code). We examine whether the last modification involved an addition, deletion, or revision.
Depending on the type, we have designed specific prompts to assess whether the ReviewComment implies or explicitly states the need for a reversion of the code changes.
For addition, our prompt asks which of the following intentions the reviewer's comment corresponds to:
Expressing an opinion. This code is not needed.
Expressing an opinion. Don't do something.
Expressing an opinion. It was a mistake to do this.
Expressing an opinion. Adding this code is incorrect.
Giving a suggestion. Move this code to another place.
Giving a suggestion. You need to do something.
Giving a suggestion. Suggest modification and provide the suggested code.
Except for the last two options, all other responses are categorized as Reversion Suggestions.
For deletion, our prompt asks which of the following intentions the reviewer's comment corresponds to:
Expressing an opinion. You shouldn't delete this code.
Expressing an opinion. You still need this code.
Expressing an opinion. Change the code back.
Raising a question. Why delete this code.
Raising a question. Why did you do this.
Giving a suggestion. You should add another piece of code.
Except for the last option, all other responses are categorized as Reversion Suggestions.
For modification, our prompt asks which of the following intentions the reviewer's comment corresponds to:
Expressing an opinion. The last modification was unnecessary.
Expressing an opinion. The last modification will cause problems.
Expressing an opinion. This looks like some code tool modification, not a deliberate modification by the developer.
Expressing an opinion. Too many modifications will make it difficult to read and cause merge conflicts.
Raising a question. Ask the reason for the last modification.
Giving a suggestion. Do not make this modification, revert to the previous version.
Raising a question. Would it be better to modify according to the comment's suggestion?
Giving a suggestion. This code should be modified according to the suggested code.
Expressing an opinion. There is an issue unrelated to the previous code changes.
Giving a suggestion. Includes nitpicking suggestions or minor problem.
Except for the last four options, all other responses are categorized as Reversion Suggestions.
The rationale behind this prompt design is twofold: firstly, to inform the LLM of the potential expressions and tones a reviewer might use to suggest reverting a change, thereby enhancing the model's domain knowledge; and secondly, to increase the accuracy of the LLM's task understanding through multiple-choice classification.
If the code reviews are not matched with the explicit suggestions and reversion suggestions, they will be categorized as General Suggestions. We employ Agent 3 to further classify them into six subcategories: four types of single-line modifications and two types of multi-line modifications.
As shown in above figure, the templates used by Agent 3 consist of two components: the System and User prompts. The System prompt instructs the LLM to interpret the code review according to one of the six predefined Intention templates. The models used in our experiments support System prompts, a parameter commonly available in most modern LLMs.
Notably, each intention category template includes placeholders that the LLM must fill based on the specific case details in the user prompt.
This design clarifies the reviewer's intent and facilitates the subsequent generation of revised code.
As illustrated in above figure, the Intention Guided Revision Generation process consists of two primary steps: generation and post-processing. Each of the three distinct intention categories—Explicit Code Suggestions, Reversion Suggestions, and General Suggestions—follows its own specific generation and post-processing. Post-processing is used to adjust and repair the generated code in cases where LLMs may over-modify or fail to maintain consistency.
For the Explicit Code Suggestions and General Suggestions categories, the generation phase is similar. Both utilize LLMs as the foundational method for code generation. Our framework does not restrict the prompting strategy for LLMs. Various prompt strategies could be incorporated within our framework. We have implemented three commonly used prompt strategies in code tasks: Simple Prompts, RAG (Retrieval-Augmented Generation) Prompts, and Self-generated Prompts. Below, we describe each of these prompt design strategies
Simple Prompt: This strategy involves describing the task scenario and introducing each field's information, instructing the model to make modifications as required.
RAG Prompt: This approach enhances few-shot prompting by selecting relevant examples from a retrieval database. The database consists of key-value pairs, where the retrieval key is a combination of the intention and the comment, and the value includes OriginalCode, Intention, ReviewLine, and RevisedCode. For a new case, the comment and its associated intention are used to retrieve relevant examples, which are then appended to the front of a simple prompt, creating a tailored few-shot prompt.
Self-Generated Prompt: This approach allows the model to generate code refinement examples on its own. Each example includes the OriginalCode, Intention, ReviewLine, and RevisedCode. The model then uses these self-generated examples as inspiration to address the original problem.
Next, we describe the generation and post-processing for each Intention category in detail:
Explicit Code Suggestions: We input the OriginalCode, SuggestionCode, and ReviewLine in the model. SuggestionCode refers to the code suggested within the ReviewComment. Based on these inputs, we design specialized post-processing criteria tailored to the unique characteristics of code refinement:
1. Inclusion of Suggestion Code: The suggestion code must appear in the revised code.
2. Invariant Code Context: The original code’s preceding and succeeding segments should remain unchanged, with the suggestion code either inserted in the middle or replacing a middle segment.
Using these characteristics, we first locate the suggestion portion within the revised code. If there is a complete match of the suggestion code in the revised code, the location is successfully identified. If not, the suggestion code has not been fully replicated. By determining the maximum matching probability for each line, we can locate the corresponding suggestion code section and copy the complete suggestion code. Additionally, by applying the rule that each line of revised code should originate from either the original code or the suggestion code, we trim redundant sections and fill in missing parts of the revised code.
Reversion Suggestions: For Reversion Suggestions, we employ a rule-based approach to generate the revision, eliminating the need for additional post-processing steps. Specifically, this involves reverting the changes that transformed the Pre-Modification Code into the Post-Modification Code (which also serves as the Original Code), resulting in the revised code. We begin by aligning the LastCodeDiffHunk with the OriginalCode, and then revert the previous code changes to the OriginalCode: delete added lines and add deleted lines from the previous modifications. This rule-based generation approach negates the need for any further post-processing.
General Suggestions: For General Suggestions, we first employ LLMs to generate the code based on OriginalCode, Intention and ReviewLine. We then design two specific rules for the post-processing of the refined code:
1. Comment Consistency: If the Intention suggests new code without comments, the modified code should also lack comments.
2. Line Consistency: For single-line modification, other lines should remain unchanged.
In summary, we have designed the most appropriate code revision method for each Intention category by selecting specific input content and devising targeted code repair strategies.