Extended response

How are predecessor and successor relationships among functions computed in the case of recursion/cycles in the call graph? And what about functions occurring multiple times?

def find_preds(self, rank, nodes_met): found_preds = [] for pred in self.preds: if pred[0].node_name not in nodes_met: found_preds.append((pred[0], rank)) nodes_met.append(pred[0].node_name) # this is the recursive call of this function found_preds.extend(pred[0].find_preds(rank + 1, nodes_met)) return found_preds

On the left hand side is the source code we used for finding all the predecessors for a specific function in the call graph.

"Rank" here means how far away is the predecessor from the current node. A rank equals to 1 means the predecessor is the direct caller of the current function.

Because we are traversing in a DFS manner, if there exists multiple path from a predecessor to the target function, the rank of the predecessor is the smallest rank number.

How is the dynamic score updated? Does Cerebro consider a method covered by a seed if it has been called once when using that seed?

Cerebro maintains a hashmap for functions and their potential scores. Recall that the potential score of a function is a sum of all the bonuses brought by its successors. This code snippet (in Rust) below shows how the bonuses get removed when a successor function is covered. The logic here is very similar to how AFL updates the bitmap in the function has_new_bits.

pub fn find_func_status(&mut self, fshm: &SharedMem, func_info: &mut FuncInfo) -> FuncStatus { let mut current = fshm.bytes() as *const u8; let mut virgin = self.inner.as_mut_ptr() as *mut u8; let mut total_score = 0 as ScoreTy; // we won't update the total_bonus here, it's too costly, we calculate every time before fuzzing the input let total_bonus = 0 as ScoreTy; let mut total_covered = 0usize; let mut covered_funcs:Vec<FuncIdTy> = Vec::new(); let mut new_func = false; unsafe { for i in 0..func_info.size() { if *current != 0 { let score = func_info.func_map_mut().get(&i).map(|fm| fm.score).unwrap_or(0); total_score += score; total_covered += 1; covered_funcs.push(i.clone() as FuncIdTy); // if a new function is covered if (*current & *virgin) != 0 { new_func = true; *virgin &= !*current; // discard the bonus of its predecessors brought by this function

                    let mut bonus_map = func_info.func_map_mut().get(&i).map(|fm| & fm.bonus_map).unwrap_or_else(||panic!("error getting the bonus map!")).clone();

if !bonus_map.is_empty() { for (func_id, bonus_score) in bonus_map.iter() {

                            let func_item = func_info.func_map_mut().get_mut(func_id).unwrap_or_else(|| panic!("error getting func item!"));

// this is where the potential score is updated func_item.dyn_score -= bonus_score; } }

                    func_info.func_map_mut().get_mut(&i).unwrap_or_else(||panic!("error getting func item!")).bonus_map.clear();

} } current = current.offset(1); virgin = virgin.offset(1); } } return FuncStatus::new(total_score, total_covered, total_bonus, covered_funcs, new_func);}

This find_func_status function is called every time Cerebro picks a seed to fuzz. So the potential score is updated before the seed's dynamic score is updated. A seed's dynamic score is the sum of all the functions' dynamic scores the seed covers. The code snippet below shows the function of updating a seed's dynamic score.

/// update the dynamic score (calculated based on the call graph), the "self" here is a queue entry(seed)pub fn update_dynamic_score(&mut self, func_info: &FuncInfo) { let previous_dyn_score = self.exec.func_stats.total_bonus; let mut total_dyn_score = 0 as ScoreTy; let covered_funcs = & self.exec.func_stats.covered_funcs; for func_id in covered_funcs { let dyn_score = func_info.func_map().get(func_id).map(|fm| fm.dyn_score).unwrap_or(0 as ScoreTy); total_dyn_score += dyn_score; } self.exec.func_stats.total_bonus = total_dyn_score;}

In short, Cerebro maintains a hashmap of functions and their potential scores by removing the bonus brought by a function's successor when that successor is covered. Then, this hashmap is used to update the dynamic scores of the seed.

The purpose of having this dynamic score is to give Cerebro awareness about the code that are not yet covered but can be possibly covered via fuzzing existing seeds.

Why are fuzzershell and pngfix listed as benchmarks, but no results from them are presented?

We didn't manage to find bugs in these 2 projects with any fuzzer in 24 hours. However, we did use them to evaluate program coverage and to evaluate the effectiveness when combining with dictionary-based mutations (because both projects can benefit from having a dictionary of tokens when fuzzing and AFL provides dictionaries for sql and png files).

Due to the page limit, we didn't put these results in the paper. However, interested readers can visit the links above for more details.

Why are there discrepancies between the results presented in Table 2 and Figure 6 for some of the benchmarks considered?

The data of "cxxfilt(old)" and "nm(old)" are wrong in Table 2. The correct data should be:

The raw data is available here, it contains the figures and statistical evaluations together with all the detailed data.

In Section 5.3, Table 3, why are only four benchmarks considered?

In the paper we only analyzed the bugs newly found by us and omitted the crashes found in the old version of cxxfilt and nm (used in the AFLFast paper). Here we provide the detailed bug level analysis on the old version targets.

We classified the crashes we found in the older version of cxxfilt and nm with their stack traces. However, we didn't manage to match them with existing CVEs. This is because the previous CVEs are somehow disordered. For example, CVE-2016-4492 and CVE-2016-4493 are actually pointing to the same bugzilla issue.

The detailed traces (with last 3 functions):

cxxfilt-invalid-read-1: work_stuff_copy_to_from <- iterate_demangle_function <- demangle_prefix
cxxfilt-invalid-read-2: demangle_template <- demangle_signature <- internal_cplus_demangle
nm-heap-bufferoverflow: d_append_buffer <- d_print_comp_inner <- d_print_comp
nm-invalid-read-1: do_type <- do_arg <- demangle_args
nm-invalid-read-2: d_unqualified_name <- d_expression_1 <- d_expression_1
nm-invalid-read-3: demangle_template <- demangle_signature <- iterate_demangle_function
nm-invalid-write-1: register_Btype <- demangle_qualified <- do_type

Also in Table 3, how were the A_12 values calculated based on TTEs?

The A_12 values are calculated according to the definition in [32].

Below is the code snippet we used for calculating this value with 2 lists of values:

# calculate the chance of element(f1s) <= element(f2s)# if we randomly pick the elementsdef calculate_a12(max_pop, f1s, f2s): numerator = 0 denominator = float(max_pop * max_pop) for first_val in f1s: for second_val in f2s: if first_val < second_val: numerator += 1 elif first_val == second_val: numerator += 0.5 a12 = numerator / denominator return a12

With 2 lists of TTEs, we can use this function to calculate the A_12 value.

Note that this requires the 2 lists to have the same size of max_pop.

So in the case where a fuzzer failed to find the bug within the time limit, we use that time limit to fill up the slot of the list.

If every element in f1s is smaller than every element in f2s, the A_12 value is 1. This means if we randomly pick an element A from f1s and an element B from f2s, the chance of A < B is 100%.

If f1s and f2s are identical, then the A_12 value is 0.5.