I came across an interesting cybersecurity challenge which I think is an excellent example of the power of symbolic analysis.
The challenge is a binary which looks for a base64 encoded file and then simply checks that the file is the correct one by traversing it. The file is not provided, so presumably one is supposed to recreate the file based on the check routine. The interesting part is that the traversal is not a loop, but hardcoded in a very long routine which breaks some traditional tools.
Running phase: Initialization ( 0.01s )Running phase: LLVM ( 0.02s )Running phase: Providers initialization ( 0.02s )Running phase: Input binary to LLVM IR decoding ( 0.28s )Running phase: LLVM ( 207465.34s )Running phase: x87 fpu register analysis ( 207467.14s )Running phase: Main function identification optimization ( 207467.63s )Running phase: Libgcc idioms optimization ( 207467.92s )Running phase: LLVM instruction optimization ( 207467.92s )Running phase: Conditional branch optimization ( 207468.81s )Running phase: Syscalls optimization ( 207477.50s )Running phase: Stack optimization ( 207477.50s )Running phase: Constants optimization ( 207495.33s )Running phase: Function parameters and returns optimization ( 207519.06s )Error: Decompilation to LLVM IR failed$ r2pm -i r2dec$ r2 a.out[0x000007c0]> aa[0x000007c0]> afl... snip ...0x000008f0    1 33           sym.badboy0x00000911    1 31           sym.goodboy0x00000930   10 405          main0x00000ac5  127 3047         sym.check... snip ...[0x000007c0]> s sym.check /* r2dec pseudo code output *//* ch30.bin @ 0xac5 */#include <stdint.h> int64_t check (int32_t arg1) {    int32_t var_18h;    int32_t var_8h;    int32_t var_4h;    var_18h = rdi;// this is pointer to base64 encoded string from file    var_4h = 0;// initialize counter    var_8h = 0xf7;// copy constant to local variable    eax = var_4h;// copy counter value to eax    rdx = (int64_t) eax; //sign extend eax to rdx    rax = var_18h; // point rax to base64 encoded string from file    rax += rdx;// increment pointer by counter    eax = *(rax);// copy first character from string into eax    eax = (int32_t) al;// just keep low byte of character    al ^= 0xa3; // xor character with 0xa3    if (eax != var_8h) { // check if (first char xor 0xa3) == 0xf7 -- 'T'         eax = var_4h;        edi = var_4h;        badboy ();    }... snip ...    var_4h++;    var_8h = 0x54;    eax = var_4h;    rdx = (int64_t) eax;    rax = var_18h;    rax += rdx;    eax = *(rax);    eax ^= 0x15;    eax = (int32_t) al;    /* Beware that this jump is a conditional jump.     * r2dec transformed it as a return, due being the     * last instruction. Please, check 'pdda' output     * for more hints. */    return void (*0x16b6)() ();The function goes character by character through the file, checks if it is the correct one and then moves on to the next. There are thousands of checks. So one approach is to take the pseudocode or disassembly listing and use that to create the proper string. I tried it and it works. But this is also a good case for using a symbolic analysis engine.
Angr is "angr is a python framework for analysing binaries. It combines both static and dynamic symbolic ("concolic") analysis, making it applicable to a variety of tasks." And the very interesting task we are going to use it for here is static symbolic analysis. In essence, angr can disassemble a binary and we analyse what series of actions will result in a desired outcome. The binary under analysis here has a pretty simple structure. After making some initial checks on the input file, such as length, it will pass the file contents to a check routine. The check routine then calls one of two functions, goodboy or badboy. So what we can do with angr is determine which file contents result in the check routine calling goodboy instead of badboy.
I find that a lot of the example content on how to use angr online does not function any more due to deprecation of API calls. Angr is being developed at a fast pace.....
Here are some resources on the details of symbolic execution:
So I think it's good to have another example out there.
Here I will use Ubuntu 18.10 in a clean virtual machine and set it up for angr with the following commands:
sudo apt updatesudo apt upgradesudo apt install python3-dev libffi-dev build-essential virtualenvwrappersource /usr/share/virtualenvwrapper/virtualenvwrapper.shmkvirtualenv --python=$(which python3) angr && pip install angrone can then run a script just by calling it like this:
python angr angrscriptThere is a useful cheat sheet on the angr github:
https://github.com/angr/angr-doc/blob/master/CHEATSHEET.mdI think it's worth going through a simple example from angr.io in order to have a good starting point for the thing we are going to do here:
consider the code given:
import osimport angrproject = angr.Project("defcamp_quals_2015_r100", auto_load_libs=False)path_group = project.factory.path_group()path_group.explore(find=lambda path: 'Nice!' in path.state.posix.dumps(1))print path_group.found[0].state.posix.dumps(0)The imports are pretty self-explanatory.
project = angr.Project("defcamp_quals_2015_r100", auto_load_libs=False)loads an executable into the current project. The executable is called defcamp_quals_2015_r100 and this executable takes a command line argument as a password. If the correct password is passed, then it prints a message: "Nice!"
auto_load_libs=False keeps angr from loading and analysing the libraries associated with the executable to save time. In this case all the logic we are interested in happens in the program, so the libraries need not be analysed.
After loading the executable, we have to decide on what analysis to perform, and in this case it's a path group. path_group = project.factory.path_group()
Then, we can look for the path group that causes the program to print "Nice!" on stdout, path_group.explore(find=lambda path: 'Nice!' in path.state.posix.dumps(1))
Finally, print the stdin of the path group that leads us there: print path_group.found[0].state.posix.dumps(0)
This is very powerful! The executable at hand uses a file input, rather than stdin, so we need an angr script that works on that.
import angrimport claripyproject = angr.Project('ch30.bin', auto_load_libs=False)        # load executable, don't analyse libaries  data = claripy.BVS('data', 0x100000*8) # declare how long the file can be; this is used by the concretize() function laterfilename = 'test.txt'                  # name for simulated file to be passed to the executablesimfile = angr.SimFile(filename, content=bytestring)            # declare contents for simulated file to be passed to the executablestate = project.factory.entry_state(args=['./executable', filename], fs={filename: simfile})  # initialize state machine to executable's entry pointsimulationManager = project.factory.simulation_manager(state)addr = project.loader.find_symbol('goodboy').rebased_addr # function we want to get toavoid_addr = project.loader.find_symbol('badboy').rebased_addr # functin we want to avoid - allows angr to cull execution paths.simulationManager.explore(find=[addr, addr2,addr3, addr4,addr5, addr6,addr7, addr8,addr9, addr10,addr11, addr12],avoid=avoid_addr, step_func=lambda lsimulationManager: lsimulationManager.drop(stash='avoid'))  # find a way through to the goodboy routinetest = simulationManager.one_found.fs.get(filename).concretize() # actually patch together the file input that results in the desired pathprint(repr(test))This works, but it very very slow and memory intensive. I found that it was much faster to just simulate some way into the check routine, copy the concretized file input up to that point and start from that point:
import angrimport claripyimport loggingimport timet0 = time.time()project = angr.Project('ch30.bin', auto_load_libs=False)        # load executable, don't analyse libaries  bytestring = None  data =    'TVqQAAMAAAAEAAAA' # this is the start of the file as we know it up to nowstringAsList = list(data);          # turn string into liststringAsList.append(claripy.BVS('data', 0x1000*8))    # declare how long the file can be; this is used by the concretize() function laterbytestring = claripy.Concat(*stringAsList)      # turn list into bytestringfilename = 'test.txt'           # name for simulated file to be passed to the executablesimfile = angr.SimFile(filename, content=bytestring)    # declare contents for simulated file to be passed to the executablestate = project.factory.entry_state(args=['./ch30.bin', filename], fs={filename: simfile}) # initialize state machine to executable's entry pointsimulationManager = project.factory.simulation_manager(state)simulationManager.activeprint("start exploration")logging.getLogger('angr.sim_manager').setLevel('INFO')addr = project.loader.find_symbol('check').rebased_addr # + 51 * len(data) # go a little further into the check routine than last timeaddr2 = addr + 1addr3 = addr + 2addr4 = addr + 3addr5 = addr + 4addr6 = addr + 5addr7 = addr + 6addr8 = addr + 7addr9 = addr + 8addr10 = addr + 9addr11 = addr + 10addr12 = addr + 11avoid_addr = project.loader.find_symbol('badboy').rebased_addrsimulationManager.explore(find=[addr, addr2,addr3, addr4,addr5, addr6,addr7, addr8,addr9, addr10,addr11, addr12],avoid=avoid_addr, step_func=lambda lsimulationManager: lsimulationManager.drop(stash='avoid'))print("start concretize")test = simulationManager.one_found.fs.get(filename).concretize()print(repr(test))t1 = time.time()print (t1-t0)This results in a base64 encoded executable:
start concretizeb'TVqQAAMAAAAEAAAA[.....]nAA==\n\x00\x00\x00[.....]\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'1452.8494970798492(angr) :~$The rest of the challenge was straight forward - the executable was a windows PE file with no surprises.