To support more vulnerability types, we provide a basic tutorial to show how to design slicing algorithms for a vulnerability type. SnapVuln can be extended to more other vulnerability types you want by embedding new designed slicing algorithms into SnapVuln based on this tutorial.
The purpose of slicing algorithms is to capture precise vulnerability semantics (i.e., vulnerability patterns). Therefore, we need to locate the range of vulnerability in the whole interprocedural graph and slice it out to represent the vulnerability semantics. Before slicing, we need to know the concept of four important factors below:
Starting point: It refers to the program point where the vulnerability starts from, which is also called source.
End Point: It refers to the program point where the vulnerability ends, which is also called Sink.
Graph: Source code can be generated into various kinds of graphs, which contain different program semantics. For example, Control Flow Graph (CFG) represents the program execution order. Program Dependence Graph (PDG) represents the data and control dependency of the program. They are the objects we perform the slicing on.
Direction: When performing slicing, we should choose forward, backward or bidirectional direction.
A vulnerability is consisted of the program code between source and sink. To slice the vulnerability semantics or patterns, we can choose to forward slice from source to sink, or backward slice from sink to source based on whether we can better capture the vulnerability semantics. When doing slicing, we also need to determine which graph the slicing should be done based on the vulnerability types.
Below we show how to construct a slicing algorithm by determining the four factors with a good strategy step-by-step.
We need to determine which program points the vulnerability type generally start from. If you want the slicing algorithm to be general, you should make the starting point more general. Otherwise, you set a strict starting point. This needs to be optimized according to the actual effect.
Take the example of Null Pointer Dereference, since the vulnerability occurs when the program accesses a pointer that expects to be valid, its starting point might be a statement that assigns an address to a pointer by calling a function call. Therefore, we retrieve all statements in the program that assign to pointers and invoke function as starting points.
We need to determine which program points the vulnerability type generally ends. Similar to starting point, if you want the slicing algorithm to be general, you should make the end point more general. Otherwise, you set a strict end point. This also needs to be optimized according to the actual effect.
Take the example of Null Pointer Dereference, since the vulnerability occurs when the program accesses a pointer that expects to be valid, we can infer that the vulnerability will be triggered when the null pointer is used for the first time. Therefore, we locate the first statement using the pointers and use it as the end point.
The vulnerability semantics of different vulnerability type can be reveal by different graphs. For example, use after free should be done on Control Flow Graph (CFG) which can reveal the execution order, since use after free is related to the execution order of operators (i.e., ''use'' and ''free''). In other cases, Program Dependency Graph (PDG) which represents data and control dependency can be used to reveal the vulnerability semantics of most vulnerability types.
Take the example of Null Pointer Dereference, since the sink is data/control depent on the source, we select PDG and peform slicing on it.
We need to select a better slicing direction according to vulnerability types. For a vulnerability type, it may contain a number of starting points and end points. We should determine the direction based on two principles:
Slices should be more general.
The number of slices should be less.
If the number of starting points is less than end points, we should select forward direction. Otherwise, backward direction should be chosen. In some cases, the starting points or end points are not easy to locate. We should slice from the points that are easy to locate.
Take the example of Memory Leak, since it id triggered if the memory has not been released until the end of the program, we can not determine where the vulnerability ends. Therefore, we slice from the starting point and select forward slicing.
After determining the above four important factors, we can peform slicing and construct the subgraph obtained by slicing.
We will implement more slicing algorithms of other vulnerability types and publish them on the website in the future.