Large Language Models for Code Analysis: Do LLMs Really Do Their Job?
Online Appendix
Online Appendix
Our dataset consists of:
Non-Obfuscated Code
C: Selected code samples from POJ-104 dataset and classic C benchmarks (Linpack, etc.);
JavaScript: The Octane benchmark and some web apps;
Python: Selected code samples from Google CodeSearchNet dataset.
Obfuscated Code
Obfuscated JavaScript code (obtained by applying different obfuscation techniques to the JavaScript branch of our Non-Obfuscated Code dataset)
Winner code of Internet Obfuscated C Code Contest (IOCCC)
(image source: https://www.bbc.com/news/technology-39901382)