Since the emergence of OpenAI's ChatGPT in November 2022, LLMs have brought significant transformations across various fields. A primary area of LLM integration is in code generation and software development tasks. Consequently, it has become crucial for Software Engineering (SE) courses to provide students with opportunities to learn how to effectively leverage these models. While many prior studies have examined the impact of introducing LLMs into university-level SE and programming courses, most of these analyses were conducted over short periods - typically a single semester - and focused heavily on student surveys and interviews. As a result, such research tends to emphasize subjective factors, such as students' perceptions and experiences, leaving a gap in understanding the objective and practical impacts of LLMs through a direct analysis of student activity data.
At SeoulTech SELab, we have consistently conducted pedagogical analyses to improve the Software Engineering curriculum within the Department of Computer Science and Engineering. Following the release of ChatGPT, we introduced LLM usage in team projects starting in 2023 and have since updated course content to keep pace with the rapid evolution of LLM technologies. In this study, we analyzed the impact of LLMs on SE team project learning by comparing data from 2022 (prior to the widespread adoption of LLMs) with data from 2024 (where LLM chatbots like ChatGPT were encouraged) and 2025 (where LLM agents like GitHub Copilot were utilized) for team projects with the same theme.
The data utilized in this study were collected during team projects conducted by a total of 158 students across 13 to 18 teams in 2022, 2024, and 2025. The dataset includes team project evaluation materials, presentation slides, survey results, and user activity logs from VS Code.
For the team project evaluation, we calculated the Satisfied Requirement Ratio (SRR) - the proportion of requirements from the Requirements Specification satisfied by the software developed by each team - at both mid-term and final evaluations for each year. The presentation materials consist of slides submitted by students during the 2024 LLM use-case presentations, which were reviewed and categorized to identify LLM application patterns. The survey results comprise responses from mid-term and final surveys conducted in 2025 regarding LLM usage. Finally, the VS Code activity logs were collected and analyzed via a VS Code Extension to capture student development activities in 2025.
Fig. 1. Q1: How frequently did you use Copilot?
Fig. 2. Primary Purposes of LLM Usage for SE Course Projects
Fig. 1 shows the responses from the mid-term and final surveys in 2025 regarding how frequently students used LLMs in their team projects. At the mid-term, 28.3% of students reported using LLMs "Always" and 39.1% reported using them "Frequently," indicating that nearly two-thirds of the students were actively utilizing LLMs. By the final survey, this combined ratio rose to over 90%, demonstrating a significant increase in the frequency of LLM usage as the projects progressed.
Fig. 2 illustrates the primary purposes of LLM usage identified through student presentations in 2024 and surveys in 2025. The results show that LLMs are predominantly used for direct code-related tasks, such as Code Generation and Code Improvement (e.g., Refactoring and Debugging). A notable observation is that when ChatGPT was the primary tool in 2024, there was a higher utilization for "Learning" and "Other" purposes - including project management and documentation. In contrast, 2025 saw a relative increase in students using LLMs for "Code Explanation" to gain a deeper understanding of their source code.
Fig. 3. Q3: What impact do you think Copilot has on the efficiency of project progress?
Fig. 4. Students’ Perceptions on the Usefulness of LLMs for Different Purposes
Fig. 3 shows the responses to the survey question asking whether LLMs were useful. In both the mid-term and final surveys, over 90% of students responded that LLMs were "Very Useful" or "Useful," indicating that students generally have a very positive impression of LLMs.
Fig. 4 examines the perceived usefulness of LLMs for each specific purpose, including "Retry" cases where students had to ask follow-up questions because they did not receive a desired answer initially. While students found LLMs particularly useful for Code Generation, nearly 50% reported that they could not get the desired answer in one go and needed to ask again. This suggests that even though LLMs are useful tools, using them effectively or easily obtaining desired answers is not a trivial matter.
Conversely, for tasks involving modifying existing code, such as Code Improvement, the percentage of "Useful" responses remained above 50% but was lower than that for Code Generation. Notably, over 60% of students reported needing a "Retry" for these tasks, indicating that utilizing LLMs to modify code was more challenging than generating code from scratch.
Furthermore, consistent with previous survey results, Code Explanation was a widely used and highly valued application. In the mid-term survey, approximately 50% of students found LLMs useful for obtaining code explanations; however, as the code became increasingly complex, this figure dropped to 33.3% in the final survey.
Finally, regarding "Learning," the results show that GitHub Copilot - which is integrated into VS Code and typically presents a small UI window - was not particularly helpful for learning through Q&A. Considering that LLM chatbots were extensively used for learning purposes in 2024, these findings suggest a need to utilize the appropriate form of LLM based on the specific intended use.
Fig. 5. Number of User Edit, Copilot Edit,
and Copilot Edit Ratio for Individual Students
Fig. 5 presents the results of analyzing student development activity logs collected from VS Code. By analyzing code change events within the VS Code editor, we distinguished between "User Edits," which are identified as manual entries by the students, and "Copilot Edits," which appear to be blocks of code automatically generated and inserted by Copilot.
The results reveal individual variations, but in terms of frequency, "User Edits" account for a significantly higher proportion than "Copilot Edits." According to the line graph representing the ratio of Copilot Edits within the total number of edits, only three students exceeded a 50% ratio. This indicates that even though many students reported using Copilot "Always," the actual frequency of manual code writing and modification remains high.
Of course, since these values represent frequency, Copilot might still lead in terms of total volume if it generates large blocks of code at once. However, a higher number of Lines of Code (LOC) does not necessarily equate to a greater contribution to the actual development task. Therefore, the discrepancy in frequency shown here suggests that even if students feel they are using LLMs constantly, the actual weight of LLMs in the overall workflow may be lower than perceived.
Fig. 6. Different Edit Type Ratio of Committed Files
Fig. 6 illustrates the analysis of edited files committed by each team, combining activity logs with commit history. For each committed file, we categorized the editing history into three types: "Copilot Edit" (only Copilot-generated edits), "User Edit" (only manual edits by students), and "Both" (a combination of both). For each team, the left bar represents the period leading up to the mid-term evaluation (Phase 1), while the right bar shows the period from then until the final evaluation (Phase 2). Since committed files represent tasks deemed "complete" by the developers, this allows us to investigate how code editing was actually performed during the development process.
The results show variations between teams, but the proportion of "Both" is high in most cases. When combined with "Copilot Edits," the proportion of code editing utilizing Copilot is high. Therefore, the survey responses in Fig. 1 stating that students use Copilot "Always" or "Frequently" resulted from utilizing Copilot during work, but it should also be noted that human intervention is still significant (high "Both" ratio).
However, it is also worth noting that the proportion of files purely edited by humans, marked in blue, accounts for a non-negligible ratio. Conversely, the green portion where work was done solely through "Copilot Edits" is relatively small. In other words, this result shows that students still have a high rate of human intervention in writing and editing code during the team project process. However, considering that the green and blue bars often increase toward the final evaluation for each team, it can be thought that a tendency emerges to distinguish between tasks that can be delegated to Copilot and those that are better handled directly by humans.
Fig. 7. Changes of LLM Dependency between Phase 1 and Phase 2 for Students
Fig. 7 shows the change in the ratio of Copilot Edits within the total edit frequency for each student between Phase 1 (up to the mid-term evaluation) and Phase 2 (from then until the final evaluation). Blue indicates students whose Copilot Edit ratio increased, while red indicates students whose Copilot Edit ratio decreased.
In the analysis results, 25 students showed an increase in their Copilot Edit ratio, and 2 students showed no change, but 20 students actually saw a decrease in the ratio of Copilot Edits. According to the open-ended survey responses, students found that as the code became more complex, instances where Copilot failed to provide desired results increased; furthermore, they attempted to reduce usage out of concern that their own programming skills would not develop if they continued to rely on Copilot. This tendency can be confirmed through the analysis of actual activity logs. Of course, it can also be seen that more than half of the students showed a gradual increase in the frequency of Copilot usage, indicating a persistent trend toward stronger dependency.
Fig. 8. SRR Distributions of Teams in 2022, 2024, and 2025
Fig. 8 illustrates the distribution of the Satisfied Requirements Ratio (SRR), the performance metric for student team projects in 2022, 2024, and 2025. SRR represents the proportion of requirements met by each team out of the total requirements provided. In 2022 and 2024, identical requirements were presented, with only minor descriptive differences. In 2025, the total number of requirements increased by approximately 28% due to the addition of requirements necessitating more complex implementations. By comparing 2022 (no LLM support), 2024 (using LLM chatbots like ChatGPT), and 2025 (using LLM agents like Copilot), we can analyze performance differences based on the mode of LLM utilization.
The boxplot distribution shows a slight upward trend toward the right; however, statistical testing confirmed no statistically significant differences in students' SRR across all years. Notably, comparing 2022 (no LLM) and 2024 (LLM chatbot), which shared identical requirements, revealed no distinct difference. This suggests that at the level of SE course team projects, students can achieve sufficient performance without the assistance of LLM chatbots.
However, considering that the SRR in 2025 did not show a clear decline despite a significant increase in requirements, it can be inferred that the use of LLM agents may have assisted in achieving a higher volume of requirements.
This study analyzed data collected as part of an effort to improve the SE course, supplemented by additional surveys and activity logs gathered in 2025. As the research was not conducted in a strictly controlled environment, there are certain limitations in the interpretation of the results.
However, this work is significant in that it cross-analyzed subjective survey responses with objectively collected data, such as activity logs. This approach allowed us to clarify how survey responses should be interpreted and to reveal actual development activities that are not always apparent through subjective perceptions alone.