In this spreadsheet is a list of the 40 code segments with a list of quality attributes that ChatGPT identified when describing their impact. If you scroll to the right there is a table of the total number of times that each quality attribute is mentioned out of the 40 files. Lastly, next to that table as well as on the last sheet is a pie chart of the quality attributes for each attribute. There are 8 sheets for each of the 8 quality attributes we are studying as well as 2 additional sheets showing the results as pie charts and bar charts.
For the majority of the 320 refactored code segments, ChatGPT was able to describe the intent (goals/motivations), instructions (refactoring changes), and impact on code quality. Below is each component broken-down and analyzed:
This pie chart depicts the answer to the question “Did ChatGPT identify accurate goals/motivations?” Out of 320 refactored code segments, ChatGPT was able to provide accurate goals for 316 files (98.8%) and was unable to do so for 4 files (1.2%).
Of the 320 files that ChatGPT was asked to provide a comment to accurately describe the goals, there were only 4 files that it did not. For 3 of the 4 files ChatGPT only provided a comment that described the overall purpose of the function rather than the goal/motivation of refactoring. For the other file the AI just simply did not provide a comment to describe the goals of refactoring. Additionally, the 4 files that did not have an accurate goal statement had accurate statements for both of the other parts and all of the files were refactored to improve cohesion.
This pie chart depicts the answer to the question “Did ChatGPT identify accurate refactoring changes/instructions?”Out of 320 files the AI provided refactoring changes for 315 files (98.4%) and did not do so for 5 files (1.6%).
ChatGPT was unable to provide accurate refactoring changes/instructions for 5 out of the 320 files. Of the 5 files, only 1 of them was also missing an accurate comment describing the impact, meanwhile, the other 4 files were only missing accurate refactoring changes comments. For the files missing accurate instruction comments, 1 of the files was given a comment that contained refactoring changes that ChatGPT fabricated, and the other 4 were just not given comments at all. In terms of quality attributes that the files were refactored for 1 of the files was refactored to improve complexity, 1 for cohesion, and 3 for reusability.
This pie chart depicts the answer to the question “Did ChatGPT provide accurate impacts?” For 319 of the 320 files (99.7%) ChatGPT successfully provided accurate impacts but did not provide an impact for 1 file (0.3%).
There was only 1 file that ChatGPT did not provide a comment for that accurately described the impact of the refactoring changes. For this file, ChatGPT did not provide a comment mentioning the impact at all. Additionally, the single file that did not have a comment describing the impact was a file that was refactored to improve cohesion. Thus, making cohesion the quality attribute with the most missing comments at 5 files followed closely by reusability at 3 and then complexity at 1 file, leading to a total of 9 files without at least one of the comments asked for.
We asked ChatGPT to identify the impact of code quality between the original and refactored code segments by naming certain quality attributes that were improved. The following pie charts display the number of times each quality attribute was mentioned out of the 40 code segments for each of the 8 quality attributes ChatGPT refactored the code for.
The tables below indicate the number of files each quality attribute was mentioned out of the 40 files refactored for 1 of the 8 quality attributes.
*This is a list of all the quality attributes that ChatGPT mentioned at least once for impact across the 320 files
Using the 40 files that ChatGPT refactored to improve quality and performance, as illustrated in the pie chart, the most prominently improved quality attribute that ChatGPT determined after refactoring was readability. Among the 40 files analyzed, ChatGPT identified readability improvements in 39 instances, accounting for 36.4% of all the quality attributes deemed enhanced after refactoring. The next most notable improvement was in maintainability, detected in 37 files, constituting 34.6% of all improved quality attributes identified by ChatGPT. The remaining quality attributes, ranked in descending order, were understandability, observed in 18 files (16.8%), performance, identified in 10 files (9.3%), flexibility, present in 2 files (1.9%), and reusability, noticeable in 1 file (0.9%). Despite refactoring all 40 files to improve performance, ChatGPT recognized the impact of performance in only 10 files.
For the 40 files that ChatGPT refactored to improve quality and complexity, similar to the performance attribute, the two predominant quality attributes identified as improved for most files were readability and maintainability, evidently in the figure. Readability was recognized in 39 of the files (38.6%) and maintainability was identified in 36 files (35.6%). The remaining quality attributes, ranked from greatest to least, were performance with 10 files (9.9%), code consistency with 8 files (7.9%), flexibility with 3 files (3.0%), understandability and reusability each with 2 files (2.0% each), and complexity with 1 file (1.0%). However, even though the 40 files were refactored to improve complexity, ChatGPT only determined that 1 file had demonstrated improved complexity after the refactoring process.
After analyzing the files that ChatGPT refactored to enhance quality and coupling, similar to the preceding attributes, the figure shows that the two primary quality attributes that ChatGPT saw improvement in for most files were readability with 34 files (31.8%) and maintainability with 33 files (30.8%). Following these attributes, understandability ranked next with 21 files (19.6%). Subsequently, reusability showed enhancement in 12 files (11.2%), flexibility in 4 files (3.7%), and performance in 3 files (2.8%). Unexpectedly, ChatGPT did not identify any improvements for coupling, despite the code segments being refactored with a focus on improving coupling.
Upon instructing ChatGPT to assess the impact on quality for the files refactored to improve quality and cohesion, the two most prevalent attributes evident in the pie chart, accounting for approximately 75% of the results, were readability with 38 files (41.3%) and maintainability with 32 files (34.8%). The remaining quality attributes, listed in descending order, were understandability with 11 files (12.0%), flexibility with 5 files (5.4%), performance with 4 files (4.3%), and reusability with 2 files (2.2%). Although the refactoring efforts of these files were aimed at cohesion improvement, similar to the results for coupling, ChatGPT did not identify any files that demonstrated enhanced cohesion after refactoring.
After asking ChatGPT to determine the improved quality attributes for the files refactored to enhance quality and readability, the most surprising finding was that maintainability appeared in the highest number of files shown in the pie chart, with 39 instances accounting for 39% of all the quality attributes identified by ChatGPT for the 40 files. Following maintainability, the next most prominent quality attribute was readability, identified in 36 files (36.0%). Subsequently, understandability was recognized in 18 files (18.0%), while performance and reusability each appeared in 3 files (3.0%), and flexibility in 1 file (1.0%). ChatGPT was able to determine that 36 of the 40 files that were refactored to improve readability had indeed achieved the desired improvement.
For the 40 files that ChatGPT refactored to improve quality and reusability, as illustrated in the figure, the two primary quality attributes ChatGPT determined were improved following refactoring was readability with 33 files (32.4%) and maintainability with 37 files (36.3%). The next most notable improvement was in understandability, detected in 19 files, constituting 18.6% of all improved quality attributes identified by ChatGPT. The remaining quality attributes, ranked from greatest to least, were flexibility with 7 files (6.9%), performance with 4 files (3.9%), and reusability with 2 files (2.0%). Despite refactoring all 40 files to improve reusability, ChatGPT recognized the impact of reusability in only 2 of the 40 files.
For the files ChatGPT refactored to improve quality and design size, the two dominant quality attributes identified as improved for most files were readability and maintainability, as demonstrated in the pie chart. Readability was recognized in 38 of the files (38.0%) and maintainability was identified in 33 files (33.0%). The remaining quality attributes, ranked in descending order, were understandability with 22 files (22.0%), flexibility with 3 files (3.0%), reusability with 2 files (2.0%), and performance with 2 files (2.0%). However, even though the 40 files were refactored to improve design size, ChatGPT determined that none of the files demonstrated improvements in design size after the refactoring process.
After instructing ChatGPT to assess the impact on quality for the files refactored to improve quality and understandability, the two most prevalent attributes as shown in the figure, were readability with 39 files (40.2%) and maintainability with 30 files (30.9%). The remaining quality attributes, listed from greatest to least, were understandability with 19 files (19.6%), flexibility with 5 files (5.2%), performance with 2 files (2.1%), and reusability with 2 files (2.1%). Although the refactoring efforts of the 40 files were aimed to improve understandability, ChatGPT identified 19 of the files as having demonstrated enhanced understandability after refactoring.