ProCURE: Addressing the Programming Concept Understanding Gap for Code Generation in LLMs via Concept-Aware Consistency Learning
ProCURE: Addressing the Programming Concept Understanding Gap for Code Generation in LLMs via Concept-Aware Consistency Learning
Abstract
Although Large Language Models (LLMs) excel at code generation, recent research reveals that they exhibit an insufficient grasp of core programming concepts, such as data flow and control flow. This limitation undermines their robustness when encountering variations in these concepts in practice; however, effective solutions that explicitly target this gap remain limited.
To address this challenge, we propose \textsc{ProCURE}, a concept-aware consistency learning framework designed to enhance LLMs’ understanding of programming concepts. Specifically, \textsc{ProCURE} first performs automated concept-oriented code augmentation to construct a concept-aligned dataset covering representative programming concepts. It then conducts concept-aware fine-tuning, encouraging the model to capture fine-grained concept variations and learn appropriate generation behaviors under such variations via a novel concept-sensitive consistency loss.
To quantify programming concept understanding, we introduce the Concept Consistency Score (CCScore), defined as the proportion of correct generations preserved under concept variations. A higher CCScore indicates a more profound understanding of programming concepts.
We evaluate \textsc{ProCURE} on four open-source LLMs across three widely used code generation benchmarks. Experimental results show that \textsc{ProCURE} improves CCScore by an average of 17.9 points, demonstrating its effectiveness in addressing the programming concept understanding gap.