Large Language Models (LLMs) have recently been widely adopted for code-related tasks, and prior research has shown that LLM-based debugging techniques can significantly outperform traditional approaches. However, it has been pointed out that such strong performance may be influenced by data leakage, rather than reflecting the true debugging capabilities of LLMs, making accurate evaluation a critical challenge.
We have studied reliable evaluation methods for LLM-based debugging that mitigate data leakage, including approaches that transform existing code while preserving the original source structure. We also explore broader strategies for more accurately assessing LLM debugging performance in realistic software development settings.
Using LLMs has become an essential skill for students who aspire to become software engineers, making the integration of LLMs into software engineering courses an inevitable trend. Understanding how LLMs affect students’ learning outcomes and skill development is therefore critical to educating better software engineers.
Since 2023, we have actively incorporated LLMs into software engineering courses and analyzed data collected from these courses. Through this work, we study students’ patterns of LLM usage and investigate how LLMs influence students’ performance and learning outcomes.
Mining Software Repositories (MSR) research focuses on systematically analyzing large-scale software repositories - such as version control systems, issue trackers, and code review data - to uncover patterns in defects, development practices, and software evolution, with the goal of improving software quality, productivity, and maintainability in real-world development settings.
We have conducted studies in this field to better understand how developers use issue tracking system features, and to develop techniques that facilitate the extraction and processing of repository data for software engineering research.
Fault Localization (FL) aims to identify the locations in source code that are likely responsible for software failures. While recent studies show that LLMs can achieve strong performance in this task, their probabilistic nature often leads to unstable results. In addition, LLM-based approaches typically operate as black boxes and incur high computational costs, which limit their reliability and practical adoption.
Logic-based Fault Localization (LogicFL) addresses these limitations by using logic programming, specifically Prolog, to infer fault locations from logical facts extracted from code. By applying predefined logical rules, LogicFL provides explainable reasoning processes and produces consistent results at a significantly lower cost.
https://arxiv.org/abs/2412.01005
Publications
Sujeong Jeong, Hoyeon Jeong, Hyogeun Park, Jindae Kim, "A Comparative Analysis of Debugging Performance and Response Quality of Ko-LLM," KCSE 2026
S. Kim, S. Jang, J. Kim and J. Nam, "EnCus: Customizing Search Space for Automated Program Repair," ICST 2025, Napoli, Italy, 2025, pp. 618-622, doi:10.1109/ICST62969.2025.10989047.
Hyogeun Park, Hoyeon Jeong, Jindae Kim, "A Performance Evaluation of Large Language Models in Practical Software Debugging Scenarios," KCSE 2025
Sechang Jang, Seongbin Kim, Junhyeok Choi, Jindae Kim, and Jaechang Nam. 2025. SPI: Similar Patch Identifier for Automated Program Repair. Journal of KIISE, JOK, 52, 2, (2025), 152-160. DOI: 10.5626/JOK.2025.52.2.152.
Muhibullaev, Boburmirzo, and Jindae Kim. "Accurate information type classification for software issue discussions with random oversampling." IEEE Access 12 (2024): 65373-65385.
Jang, S., Choi, J., Kim, S., Kim, J., & Nam, J. "SPI: Similar Patch Identifier for Automated Program Repair." KCC 2024.
Lee, Gunwoo, Jindae Kim, Myung-seok Choi, Rae-Young Jang, and Ryong Lee. "Review of Code Similarity and Plagiarism Detection Research Studies" Applied Sciences 13 , no. 20: 11358, 2023, https://doi.org/10.3390/app132011358
Moojun Kim, Beomchul Kim, Jindae Kim. Change Description Difference Analysis between Human and Code Differencing Techniques. Journal of KIISE, 50(2), 150-161. 10.5626/JOK.2023.50.2.150, 2023
Moojun Kim, Beomchul Kim, Jindae Kim, A Study on Change Description Differences between Human and Differencing Techniques, (KCSE 2022).
Kim, Jindae, and Seonah Lee. "An Empirical Study on Using Multi-Labels for Issues in GitHub." IEEE Access 9, 2021
Hyungwon Lee and Jindae Kim. A Study on Correctness of Source Code Differencing Techniques, Proceedings of Korea Software Congress, 2020
Kim, Jindae, et al. "The effectiveness of context-based change application on automatic program repair." Empirical Software Engineering 25.1 (2020): 719-754.
Kim, Jindae, and Sunghun Kim. "Automatic patch generation with context-based change application." Empirical Software Engineering 24.6 (2019): 4071-4106.
Kim, Jeongho, Jindae Kim, and Eunseok Lee. "Vfl: Variable-based fault localization." Information and Software Technology 107 (2019): 179-191.
Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically generated patches as debugging aids: a human study. (FSE 2014).