Capabilities of LLM for Code Analysis: An Empirical Study

Summary

Our research elucidates why the generated code, although executable, may not be reliable. From our analysis, we propose that a robust foundational code model requires a strategy to train the model in understanding code syntax/grammar, static code behavior, and dynamic code behavior. Without the support of these three abilities, it remains risky to employ GPT4 and GPT3.5 or other similar models in software production. The models, due to their lack of code analysis capacity, cannot assure the security of the code.

In the "All Data" tab, we provide the code that calls OpenAI API in this study. If you want to check the answers from LLMs, you should use API.

Let's see one more Motivation Example (left figure), more tricky

We use PlayGround ChatGPT with the parameter temparature of value 0.

We simply insert one space before and after arr in enumerate(arr), respectively (as indicated by the underline).

We observe that k is incremented by 1. While this may not necessarily be an issue, it is unexpected and indicates that the model is not stable.

Resource:

Raw buggy code, code link

Fixed Code, offical_fixed

Page updated

Google Sites

Report abuse