Code refinement aims to enhance existing code by addressing issues, refactoring, and optimizing to improve quality and meet specific requirements. As software projects grow in scale and complexity, traditional iterative interactions between reviewers and developers become cumbersome. Despite recent efforts to expedite this process using various deep learning techniques, performance remains limited due to an inability to accurately understand reviewers' intent. This paper proposes an intention-based code refinement technique, transforming the conventional code refinement process from comment to code to intention to code. The process is decomposed into two phases: Intention Extraction and Intention Guided Code Modification Generation. Intention Extraction categorizes comments using predefined templates, while the latter employs large language models (LLMs) to generate revised code based on these defined intentions. Three categories with eight subcategories are designed for comment transformation, followed by a hybrid approach that combines rule-based and LLM-based classifiers for accurate classification. Extensive experiments with five LLMs (GPT4o, GPT3.5, DeepSeekV2, DeepSeek7B, CodeQwen7B) understand different prompting settings demonstrate that our approach achieves 79% accuracy in intention extraction and up to 66% in code refinement generation. Our results underscore the potential of this approach in enhancing data quality and improving code refinement processes.
Fig. The framework of our intention-based code refinement.
Fig. The framework of our intention-based code refinement.
Based on the different methods developers use to modify code, we categorize Intentions into three major classes and eight subcategories. The three major classes are Explicit Code Suggestions, Reversion Suggestions, and General Suggestions.
Explicit Code Suggestions: This category includes the exact code that needs to be applied. Developers only need to identify the appropriate location and insert the given suggestion code accordingly. For this type of comment, the intention is clear and direct, as the target changes are already explicitly stated within the comment.
Reversion Suggestions: These imply that the reviewer explicitly or implicitly indicates that the previous modification was unsuitable and suggests reverting to the version before the last modification. For this type of comment, the intention is to restore the code to its previous state.
General Suggestions: Comments that do not fall into the two specific categories above are classified as general suggestions, which lack explicit intentions. For these comments, we characterize their intention based on two general aspects: the type of change (i.e., insertion, modification, and deletion) and the corresponding scope of the change (i.e., single-line or multi-line). Since insertion is a specific form of modification, we group it under the broader category of changes, resulting in four general intention categories: single-line change, single-line deletion, multi-line change, and multi-line deletion. Additionally, we observed that word-level changes are particularly common in single-line edits. Therefore, we distinguish between word-level changes and code-level changes within single-line edits, which maintains clarity and operability. Ultimately, for this type of comment, we have six types of general intentions:
Single-line change: Change word (...) to (...)
Single-line change: Delete word (...)
Single-line change: Change the code to <code>
Single-line change: Delete code <code>
Multi-line change: Delete code lines <code>
Multi-line change: Change the code lines <code 1> to <code 2>
Note that our classification differs from those in previous works due to a difference in focus. Previous studies primarily aim to perform post-analysis, understanding the concrete actions from both refined code and comments, such as renaming variables or fixing specific bugs. In contrast, our work focuses on pre-analysis to identify the potential intention from review comments, which is more difficult. While it is possible to refine these classifications and intentions further, doing so would significantly increase the complexity of predicting such intentions from code review. An incorrect prediction of intention could potentially misguide the refinement process, leading to unintended changes or deviations from the desired outcomes. Therefore, this paper focuses on intentions that are either easy to extract (i.e., explicit changes and reversions) or with high-level patterns (i.e., the six general intention patterns).
The dataset, prompts, code, and experimental results used in this paper are all included in the compressed package.