Combination with Dictionary-based Mutation

The figure above shows an overview of the workflow of greybox fuzzing.

We can see that Cerebro is orthogonal to the techniques modifying the seed mutator, executor or feedback collector.

One of the most popular topic in greybox fuzzing nowadays is constraint penetration.

Several techniques are proposed to tackle this problem (such as Angora, Steelix, T-Fuzz).

However, the easiest way of helping the fuzzer to penetrate constraints is to use a list of user provided tokens (dictionary) for seed mutation.

Although, comparing to the previously mentioned approaches, using dictionary-based mutation requires a priori knowledge about the format of the input, it is easy to implement and still very powerful.

We conducted extra experiments on pngfix and sqlite to see how well Cerebro can compliment orthogonal techniques.

The experiments also last for 24 hours and are repeated 10 times to mitigate the effects of randomness.

On the left hand side are the results without dictionary-based mutation.

On the right hand side are the results with dictionary-based mutation.

Edge coverage on pngfix w/o dictionary

Edge coverage on pngfix w dictionary

Edge coverage on sqlite w/o dictionary

Edge coverage on sqlite w dictionary

From the results, we can see that dictionary-based mutation can help to increase the coverage of all three fuzzers.

In particular, AFLFast seems to benefit less from applying dictionary than the other two.

For example, AFLFast can achieve better coverage on pngfix than AFL when not using dictionary, but when supplied with dictionary-based mutation, AFLFast covers less code.

Another example is that on sqlite, AFLFast has a clear advantage of converging faster than the other two fuzzers in the first few hours when not using dictionary.

However, this advantage becomes much less significant when dictionary is used.

This is because AFLFast's seed prioritization and power scheduling algorithms favor the seeds covering rare (less frequently executed) edges and with dictionary, the rarity of the edges are disturbed (the frequency of executing certain edges are boosted by the passing those constraints).

On the contrary, the seed prioritization and power scheduling algorithms in Cerebro does not suffer from this issue and Cerebro can still take the lead in coverage when dictionary is used.

This shows that Cerebro can complement the constraint penetration techniques well.