Gepostet am: Jul 28, 2011 4:35:8 PM
structs are cool. really cool.
to see what i mean, we have to take a look at how c++ classes are actually implemented.
machine code knows two things to store data: cpu registers (eax,ebx and friends) and the RAM. now, the machine doesn't have any storage for "objects" (or better: object member variables and function).
so a two-staged trick is employed:
there is a fixed block in the .DATA segment for each class. it contains pointers to the function entry points (a.k.a. the vtable)
in the RAM memory will be allocated for every instance of an object. this memory will be used according to a struct, with the first element of the struct being an offset to the vtable.
a pointer to the RAM-stored data will be passed to every non-static function of a class so the method can use this variable as "this".
now, to the code:
this is the vtable of the class "FileRStream" (photo taken from es2.exe disassembly). as you can see, the distance between each offset is 4 bytes or 1 dword.
now, let's go to one of the constructors:
this is a "blank" constructor. the first call is going to the class "StreamIO", which is the parent class of "FileRStream" (how you see this will be explained in another post about RTTI).
then it gets interesting. the mov ebx, [ebp+vtablePtr] (which actually should be named "this", but I saw this too late), places the memory location of vtablePtr (a stack variable!) into ebx.
now, the four mov's are what matters: they initialize the local variables. which are on the stack... and directly after the vtablePtr.
what's needed now is where the other variables are used (and how). well, that's fairly easy (simply take the other member functions and look for instructions referencing to the this parameter).
here's a list for FileRStream (offsets in un-aligned bytes! MMIX asm for example does aligning when mixing various types, x86 does not! beware!)
1 dword @ this+0: pointer to jumptable
1 dword @ this+4: status of the stream
1 word @ this+8: file handle (obtained with open() function)
1 byte @ this+10: unknown, un-used so far
this is what the IDA decompiler makes out of the asm code now:
what a horrible, horrible mess. this looks ugly. so... we know there's a fixed data format for the this object. and how are such things best represented in C? with structs. which brings us back to the title.
let's make a struct by opening the struct subview (shift+f9 is the standard key combination). to create a new struct, simply press the insert key and name the new struct FileRStream:
now, navigate your cursor to the ends statement and press the "d" key.
then, press "d" until you have a struct member with the wanted size (of course, if you want a byte member, no need for that). the codes are dd for something that was a dword (32 bit), dw for something that was a word (16 bit) or db for something that was a byte (8 bit).
when done, navigate to the ends statement and press "d" again (and repeat the last three steps until you have all member fields in correct size):
now, rename the fields by clicking on each one and pressing "n":
the last thing won't be visible in the end, only when we later hit the decompiler: jumptablePtr is not an ordinary dword, it's an offset. so click on it and press ctrl+r. choose 32-bit full offset as type and click ok:
if IDA asks you something about resetting target stuff, answer yes. if you've done everything right, it should look like:
part 1 is done. now, go to the pseudocode-view of the constructor, right-click on the int* vtablePtr in the function parameter list and select "convert to struct *". select the newly-created struct from the menu and if everything went OK, it should look like this:
looks far better readable, huh?
now just change the return type of the function by clicking on its name in the declaration and pressing "y". take care that you write the FileRStream after the int, and then delete the left-over int, as IDA will mess with its autocomplete!
the last cast for StreamIO will never disappear, as no C compiler can know that "StreamIO" is a base type of "FileRStream". ah well, at least the code is readable now.