Background

Here we elaborate some background concepts of fuzzing / greybox fuzzing / directed fuzzing

Basic Block A basic block is a contiguous block of instructions which has only one entry at begin and only one exit at end.

Branch The transition between two basic blocks is called branch.

Control-flow graph (CFG) CFG represents the basic blocks transition relationship of a program in which each node reprensents a basic block of the program and each edge represents the branch between two basic blocks.

Call graph (CG) CG represents the function call relationship of a program in which each node represents a function and each edge represents one function calls another function.

Fuzzing Fuzzing is a widely-deployed software testing techinique which repeatedly runs the program under test (PUT) with provided inputs. The provided inputs are either generated (produced from templates like grammar) or mutational (produced from changing other inputs). Normally, fuzzing can be categorized into three groups: black-, grey- and white-box fuzzing. During fuzzing process. black-box fuzzing does not get any information about the internals of the PUT and just blindly tests programs with generated test cases efficiently; white-box fuzzing generates test cases by analyzing the internal information gathered when executing the PUT. Although white-box fuzzing is able to systematically explore the state space of the PUT, it causes much performation overhead at run time. The representative white-box fuzzing technique is dynamic symbolic execution (DSE); grey-box fuzzing adopts a middle approach: it uses light-weight instrumentation to get the dynamic internal information when executing the PUT and introduces negligible performance overhead.

Greybox fuzzing (GF) GF is an efficient software testing technique for vulnerability detection which obtains runtime information of the PUT without introducing much performance overhead. Recent years GF becomes a hot topic in fuzzing area.,it instruments the PUT with lightweight instrumentations, when executing the PUT with an input, the instrumentations can determine whether this execution triggers PUT 's new dynamic state ( e.g. new basic blocks or branches). If triggering new state, the corresponding inputs will be saved as seeds, new inputs are generated by mutating the seed inputs. Generated new inputs are further used to testing the PUT.

Directed fuzzing General fuzzing technique focuses on testing the whole program and is undirected. Unlike it, directed fuzzing concentratedly testing small parts of PUT without wasting resources on unrealted parts, which is especially useful when the size of PUT is huge. Directed fuzzing effectively focuses on reaching the given traget locations (tragets) and can be applied in special testing or security scenarios according to the type of tragets : 1) Patch testing. When some components of programs changed, we would like to test and check whether changed components introduce vulnerabilities. In this scenario, tragets are set to the changed parts of program (also called patch). 2) Crash reproduction. When in-field crashes are reported, only stack-trace and some environmental parameters are sent to the in-house development team. To preserve the user’s privacy, the specifc crashing input is often not available. To help developers debuging the vulnrabilities, directed fuzzing is used to produce the inputs which can trigger the crashes. In this scenario, targets are set to the crash points in the stack-traces. 3) Static analysis report verification. Most of the vulnerability issues reported by static analysis tools are false positive due to the inaccuracy of static analysis，causing much manual efforts to recheck. Directed fuzzing can help to automatically test and verifiy the reports produced by static analysis tools. In this scenario, targets are set to the statements that static analysis tools report as potentially dangerous.

Directed greybox fuzzing (DGF) DGF achieves the goal of directed fuzzing based on the technical framework of greybox fuzzing. AFLGo is the earliest DGF proposed in 2017, before which the directed fuzzing is based on sybolic execution. DGF is more efficient and available than the sybolic-execution-based directed fuzzing techniques as showed in the paper of AFLGo. In a high level, DGF leverages the call graph and control-flow graphs of program to calculate the distance between each basic block and the traget sites, then DGF instrument the calculated distances into the corresponding basic blocks at static analysis stage; During fuzzing (dynamic) stage, DGF computes the distance of a seed according to the basic blocks in the seed's execution trace; DGF further uses the seeds' distance to decide how to schedule these seeds ( which seeds are first selected to be mutated or how many mutation times spends on the selected seeds ) 。