Research Projects

Assessment, Exploitation, and Mitigation of Syntactic Robustness Failures in LLM-based Code Generation

In this work, we define syntactic robustness of LLM-based code generation as the model’s ability to produce correct code when given syntactically varied but semantically equivalent prompts, particularly those involving mathematical constraints (e.g., rewriting x + y = b as x - b + y = 0). Our findings show that current LLMs often fail to maintain correctness under such variations, indicating a lack of syntactic robustness. This vulnerability can be exploited to induce incorrect or misleading code generation. To address this, we propose a reduction-based pre-processing step that helps mitigate this issue by normalizing prompt representations before generation.

Quantitative Partial Symbolic Equivalence Analysis

In this project, I introduce partial equivalence analysis to perform a more refined assessment of non-equivalence between two programs. I propose a quantitative symbolic equivalence analysis technique that identifies the input conditions under which two non-equivalent programs' behaviors align or diverge, along with a quantitative measure. To improve the scalability of existing model-counting-based techniques, I develop a range-search-based approach. This method explores the input space intervals and identify regions of equivalence and non-equivalence by calling SAT solvers. Overall, the approach enhances both precision and performance compared to state-of-the-art techniques.
Github Link

Figure: Best hybrid techniques (Pre-Post and Pre-Guided) VS SOTA symbolic approaches

Hybrid Equivalence and Non-equivalence Analysis

In this work, I effectively combined symbolic execution and fuzzing to check program equivalence and non-equivalence. Symbolic execution is precise and can formally prove equivalence, but it’s expensive. On the other hand, fuzzing is scalable and can prove non-equivalence only. To get the best of both worlds, I proposed several heuristics to combine these methods depending on where symbolic execution and fuzzing are positioned in the analysis pipeline. One of the key ideas involves using symbolic execution to guide fuzzing, allowing it to focus on inputs that are more likely to expose non-equivalence between programs. The evaluation shows that the hybrid approach outperforms SOTA techniques.
Github Link

Page updated

Google Sites

Report abuse