Finding Critical Data in Programs Using Deep Learning

Highlights

Identify the Critical Data: Find and label the critical data at source code level, both manually and with the assist from a customized AFL fuzzing tool. The tool is developed by the team so that each conditional variable that change the program execution significantly (i.e. change in number of basic blocks executed) is highlighted for manually investigation.
Slice the program on data dependency: Developed a LLVM-based tool to slice the program on data dependency for source code level evaluation. The target variable's DEF-USE dataflow is captured.
Trace the Programs: Compile, test, and trace the program of interest using a tracer developed based on Intel Pin Tool.
Implement Baseline: Create and evaluate the baseline model and provide the results in order to make comparison.
Extend to Source Code Level: Extend the project to source code level, so that the method can be more general. The source code will be sliced based on the dataflow relationships between the target variables, and the resulting sliced code will be feed into a CodeBert model.
Analysis using PDG: Construct, analyze the PDG (Program Dependence Graph) based on the LLVM to achieve identifying critical variable at source code level.

Submitting/Under Review

Page updated

Google Sites

Report abuse