Update: This plugin has been merged and is now a part of PANDA.
It's very common to log program execution for either coverage or full tracing using either a PIN tracer [1,2,3], qemu in user mode [1] or drcov [2,3,4]. However, these tools are fragile with respect to obfuscated or self-modifying code. In those cases, it is desirable to extract a program execution trace directly from a whole-system emulation of a computer, such as PANDA.
There is at least one execution tracer included in PANDA as a plugin to provide csv files that can then be imported into IDA pro using a python script. This is often sufficient. However, Binary Ninja has a some advanced capabilities to deal with overlapping code instructions and so it would be nice to have some means to import the trace information there. And lighthouse has some very powerful functionality to analyze coverage data. And since lighthouse runs on both IDA pro and Binary Ninja, it's a natural program to target PANDA coverage output to.
PANDA documents the structure of plugins here [5].
At bare-bone do-nothing PANDA plugin with needs the following:
name_of_pluginname_of_plugin.c or name_of_plugin.cppinit_plugin() functionuninit_plugin() functionMakefile with the one-line content $(PLUGIN_TARGET_DIR)/panda_$(PLUGIN_NAME).so: $(PLUGIN_OBJ_DIR)/$(PLUGIN_NAME).oconfig.panda in the plugins directory (just the name of the plugin)Thus, I created a subdirectory, lighthouse_coverage, in the plugins directory, added the one-line Makefile and the source file lighthouse_coverage.c :
#include "panda/plugin.h"bool init_plugin(void *self) { return true;}void uninit_plugin(void *self) {}In order to be sure that it works, I added a printf() statement.
#include "panda/plugin.h"#include "stdio.h"bool init_plugin(void *self) { printf("loaded lighthouse plugin\n"); return true;}void uninit_plugin(void *self) { }And we have a working plugin, which can be invoked in the same way as any other panda plugin
./panda-system-x86_64 -m 4096 -replay theRecording -panda lighthouse_coveragePANDA[core]:initializing lighthouse_coverageloaded lighthouse pluginloading snapshot[ ... ]At this point the plugin does no useful work. PANDA plugins work by hooking callback functions into events as they occur during execution or playback. Lets hook a callback function into an event when our plugin loads.
#include "panda/plugin.h"void before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) { // this function gets called right before every basic block is executed printf("%#018"PRIx64"\n" , translationBlock->pc); // print out program counter of basic block return 0;}bool init_plugin(void *self) { panda_cb pcb = { .before_block_exec = before_block_exec }; panda_register_callback(self, PANDA_CB_BEFORE_BLOCK_EXEC, pcb); // register the callback function above return true;}void uninit_plugin(void *self) { }TranslationBlock has a member, pc, that is the program counter of the basic block about to be executed. This yields a list of all the basic block addresses executed:
loading snapshot... done.opening nondet log for read : theRecording-rr-nondet.log0xffffffff81c01c400xffffffff81c009200xffffffff81c0096f0xffffffff81c009c20xffffffff81c009d10xffffffff81c01c4a[ ... ]This is, of course, not sufficient because all processes are going to be intermingled here. To the rescue comes Operating System Introspection (OSI). OSI adds the capability to obtain the process names and thread IDs for each basic block (and much more). So, lets add process names to the block addresses and print everything to an output file.
void before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) { // this function gets called right before every basic block is executed if (panda_in_kernel(first_cpu) == 0) // I'm not interested in kernel modules {OsiProc * process = get_current_process(cpuState); // get a reference to the process this TranslationBlock belongs to if (process) { fprintf(outputFile,"\n%s@%#018"PRIx64"", process->name, (translationBlock->pc); free_osiproc(process); // always free unused resources } } return 0;};And that is basically it. We just add some function prototypes and such to reduce compiler warnings and we have a finished plugin:
#include "panda/plugin.h"// OSI#include "osi/osi_types.h"#include "osi/osi_ext.h"// function prototypesvoid before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) ;void uninit_plugin(void *self) ;bool init_plugin(void *self) ;FILE * outputFile = 0; // pointer to output file...void before_block_exec(CPUState *cpuState, TranslationBlock *translationBlock) { // this function gets called right before every basic block is executed if (panda_in_kernel(first_cpu) == 0) // I'm not interested in kernel modules { OsiProc * process = get_current_process(cpuState); // get a reference to the process this TranslationBlock belongs to if (process) { fprintf(outputFile,"\n%s@%#018"PRIx64"", process->name, (long unsigned int)(translationBlock->pc)); free_osiproc(process); // always free unused resources } } return;};bool init_plugin(void *self) { panda_require("osi"); // ensure that OSI is loaded assert(init_osi_api()); // ensure that OSI is loaded outputFile = fopen("lighthouse.out", "w"); // open output file panda_cb pcb = { .before_block_exec = before_block_exec }; panda_register_callback(self, PANDA_CB_BEFORE_BLOCK_EXEC, pcb); // register the callback function above return true;};void uninit_plugin(void *self) { fclose(outputFile); // close output file};And we can then call the plugin like any other:
./panda-system-x86_64 -m 4096 -replay '/media/jan/80669BBB669BB080/ch34_1char' -os linux-64-ubuntu -panda osi -panda osi_linux:kconf_group=ubuntu:5.3.0-28-generic:64 -panda lighthouse_coverageor for a windows guestand we get the following type of output:
$ more lighthouse.out gmain@0x00007f299b429bf9gmain@0x00007f299b429c01gmain@0x00007f299b445740gmain@0x00007f299b445748gmain@0x00007f299b445764gmain@0x00007f299b44576fgmain@0x00007f299b429c0dgmain@0x00007f299cd195c9[ ... ]Next, we need to change the lighthouse parser for the 'mod+off' format so that it can take our new mod@address format ( I bolded the relevant code changes). I call this modat.py:
import osimport collectionsfrom ..coverage_file import CoverageFilefrom lighthouse.util.disassembler import disassemblerclass ModAtData(CoverageFile): """ A module@address log parser. """ def __init__(self, filepath): super(ModAtData, self).__init__(filepath) #-------------------------------------------------------------------------- # Public #-------------------------------------------------------------------------- def get_offsets(self, module_name): return self.modules.get(module_name, {}).keys() #-------------------------------------------------------------------------- # Parsing Routines - Top Level #-------------------------------------------------------------------------- def _parse(self): """ Parse modat coverage from the given log file. """ imagebase = disassembler._bv.start modules = collections.defaultdict(lambda: collections.defaultdict(int)) with open(self.filepath) as f: for line in f: trimmed = line.strip() # skip empty lines if not len(trimmed): continue # comments can start with ';' or '#' if trimmed[0] in [';', '#']: continue module_name, bb_offset = line.rsplit("@", 1) modules[module_name][int(bb_offset, 16)-imagebase] += 1 self.modules = modulesPANDA:
Binary Ninja: modat.py needs to be placed into the lighthouse/reader/parsers directory. In the Binary Ninja plugin directory, there should be a file called lighthouse_plugin.py and a folder called lighthouse. Place modat.py there in the relative path lighthouse/reader/parsers
And now we get our payoff: Coverage data collected from the binary within a full system emulation:
References: