Learning Disparities:
In this study, the experiment group may face a relatively higher learning curve when using CoEdPilot, while the control group is already acquainted with CoPilot. This learning disparity could lead to observed differences in the test that are attributed to learning effects rather than the actual performance of the extension.
Edit task design:
To enhance participants' comprehension of the code and editing intent, we derived simplified versions of code from actual commits and furnished comprehensive instructions. This modification may introduce a departure from the authentic editing scenario for the control group, as they will need to inspect the subsequent edit location manually.
Programming language:
The decision to exclusively focus on Python for the editing tasks may introduce a threat to internal validity. Python has its unique syntax, paradigms, and coding practices, and participants familiar with Python may perform differently than those accustomed to other languages. This task specificity in language choice could confound the interpretation of the plugin's effectiveness.
Limited Sample Size in the User Study:
Due to the limitation of time and resources, we recruited 18 participants in the user study. The relatively small sample size may not provide sufficient statistical power to detect genuine differences in the effectiveness of the extension. Consequently, the generalizability and robustness of the study findings might be compromised.