Case Study 1 -- MJS

MJS is an embedded JavaScript engine for C/C++. It only contains one single C source file. We use this project to show how to use Hakweye to generate distances for projects that do not contain build scripts to generate the target programs (PUTs).

  • Get MJS source code
mkdir info1
export SRC_DIR=$PWD/mjs
export INFO_DIR=$PWD/info1
git clone git@github.com:cesanta/mjs.git
cd $SRC_DIR && git checkout d6c06a61743d6748ac167adb75da3d81d5d62070
  • Prepare configuration file "llvm.toml" and target site file "tgt_lines.in"

# llvm.toml

[tgt]
infile = "tgt_lines.in"
out_dir = "."
[ins]
proj_name = "mjs1"
dist_file = "bb.dist"

# tgt_lines.in

/home/hawkeye/test-ccs/mjs/mjs1/mjs.c:8413
/home/hawkeye/test-ccs/mjs/mjs1/mjs.c:9369
/home/hawkeye/test-ccs/mjs/mjs1/mjs.c:11843
  • Preprocessing with he-pp
cd $INFO_DIR
he-pp -DMJS_MAIN -fsanitize=address $SRC_DIR/mjs.c -ldl -O0 -o mjs.out
# the normal compile command is:
# clang -DMJS_MAIN -fsanitize=address $SRC_DIR/mjs.c -ldl -O0 -o mjs.out

There will be a file named "mjs.out.0.0.preopt.bc" which is an LLVM Bitcode.

  • Analyze the bitcode with "libhe-tgt.so"
opt-6.0 -mem2reg -load /usr/local/lib/hawkeye/libhe-tgt.so -he-conf ./llvm.toml -he-analyze ./mjs.out.0.0.preopt.bc -o /dev/null

newly generated files include:

bb_calls.txt
callgraph.yaml
cfg/ # directory include CFG files
funcs.txt
tgt_bbs.txt
tgt_funcs.txt
  • Generate distances:
he-dists -b ./mjs.out.0.0.preopt.bc -i $PWD

Two files, "funcs.dist" and "bbs.dist", will be generated.

  • Compile and Instrument:
he-clang -he-conf ./llvm.toml -DMJS_MAIN -fsanitize=address $SRC_DIR/mjs.c -ldl -O0 -g -o mjs.out
  • Map function traces:
he-funcs extract -p mjs1
# mjs1 is specified by "ins.proj_name" inside "llvm.toml", "proj_trace_funcs.json" will be generated
he-funcs score -d funcs.dist -m funcs.txt -p proj_trace_funcs.json -o trace_funcs.json
  • Specify configuration for fuzzing:

# Config.toml

[io]
in_folder = "in"
out_folder = "out"

[exec]
use_forkserver = true
mem_limit = 200
timeout = 50
qemu_mode = false

[exec.sa]
trace_func_file = "trace_funcs.json"
callgraph_file = "callgraph.yaml"
tgt_func_file = "tgt_funcs.txt"

[record]
proj_name = "mjs1"
interval = 2000
url = "redis://127.0.0.1/"
log_entry_info = false

[calibration]
# for simple regular case calibration
normal_cycles = 7
# for variable behaviors calibration
var_behavior_cycles = 37

[minimize]
ck_redundant_file = false

[mutation]
# ops = ["det", "dict", "havoc", "splice"]
# ops = ["havoc", "splice", "sem"]
max_file_length = 65536
#dict_folder = "dicts_test"
dict_level = 0
# max_token_length: 64
# min_token_length: 2
# max_dict_size: 256
# in minutes
havoc_adjust_duration = 12

[fz]
workers = 1
bind_cpu = false
# "normal"/"crash"
keep_mode = "normal"
# "simple"/...
scorer = "simple"
exit_nonzero_as_crash = false
ignored_signals = []

[fz.conductor]
# in minutes
report_duration = 3

[fz.sync]
duration = 200
execs = 5
  • Run the fuzzer:
mkdir in
cp $SRC_DIR/mjs/tests/*.js in/
he-fuzz -c ./Config.toml -- ./mjs.out @@
# regular command to run mjs.out is like:
./mjs.out in/err1.js