Prompt Engineering

After prompt engineering for all three research questions, we decided that the most consistently efficient way was trial and error. We began with basic prompts that clearly portrayed a good foundation to achieve our goals. Using the initial prompt, we inputted it into ChatGPT along with one of the code segments and studied the response to determine its strengths and weaknesses. We then questioned what we could add or take out of the prompt in order to get a more clear and useful answer, and created a new version of the past question based on our findings from the last response. With every new prompt we created, we submitted it to ChatGPT with approximately 5 different code segments in order to also determine the consistency of the answers that prompt would generate. With the analysis of every prompt we had considered, we came together as a group to discuss which one generated the most consistent, accurate answers that were fitting to the goal of our research.

RQ1 Prompt:

Go to RQ1

With no explanation refactor the Java code to improve its quality and [quality attribute]: [include original code here]

For this research question, our objective was to utilize ChatGPT to refactor Java code based on a given prompt. To ensure clarity and simplicity, we collectively agreed to begin each prompt with the phrase “With no explanation,” indicating that we desired ChatGPT’s output to solely consist of the refactored code. Additionally, we specifically mention that the provided code is written in the Java programming language to prevent potential misidentification by ChatGPT, which could lead to incorrect syntax modifications during refactoring. To clearly indicate the start of the code segment, we found it beneficial to conclude the prompt with a colon. Taking these considerations into account, we formulated four prompt variations and evaluated their performance with numerous Java files, and ultimately decided on the prompt above. The picture below shows an example of using the prompt.

RQ2 Prompt:

Go to RQ2

Message 1: Remember this Java code segment: [include original code here]

Message 2: Does this Java code preserve the behavior of the above code segment? Explain in detail why or why not: [include refactored code here]

For this research question, our primary goal was to evaluate whether the refactored code generated by ChatGPT retained the original behavior of the Java code. To do this we utilized ChatGPT to determine if the original Java code and ChatGPT’s refactored code preserved the code’s original behavior.

During the experimentation process, we encountered a challenge when attempting to include both the original Java code and the refactored code in a single prompt to ChatGPT, as it often exceeded the prompt's capacity. To address this issue, we devised a two-step approach. First, we instructed ChatGPT to remember the original code segment by providing it as a prompt. ChatGPT would then acknowledge that it has stored the code and repeat it back to us. Then, in a separate message, we would write the prompt we formulated and include the refactored code along with it.

To ensure there was no bias, we avoided labeling the code segments as original and refactored, which eliminated any implicit relationship between them. Instead, we presented the two code segments independently. We decided on the prompt above which is split into two messages. The picture below shows an example of using the prompt.

RQ3 Prompt:

Go to RQ3

Message 1: Remember this Java code segment: [include original code here]

Message 2: Provide concise commit messages that describe the overall goal, refactoring changes, and impact of quality of the following code: [include refactored code here]

In this research question, our main objective was to explore whether ChatGPT could effectively provide documentation for the refactored code segment, describing its intent, instructions, and impact. We defined "intent" as the motivation and goal behind the code changes, “instructions” as the specific refactoring modifications, and “impact” as the effect on code quality.

Similarly to RQ2, we decided to use the strategy of breaking the interaction into two messages. We first asked ChatGPT to remember the original code segment in one message, and in a subsequent message, we presented our prompt along with the refactored code. This strategy helped us overcome the character limit issue and allowed ChatGPT to process the information more effectively.

After trying multiple prompts we found the best results when asking ChatGPT to provide “concise commit messages,” which are commonly used by programmers to explain changes made to a project. We also replaced “intent” with “goal” in the prompt, as ChatGPT seemed to interpret it more accurately. Also, by adding “overall,” ChatGPT was prompted to focus on the code segment as a whole and its broader context. Therefore, that is why we chose the prompt above to move forward with. The picture below shows an example of using the prompt.

Go to Experiment Overview

Page updated

Report abuse