Q) Is it okay to set an arbitary limit to the length of the strings?
A) No. You should be able to handle strings of any length. That means you cannot have a static sized buffer to hold the entire length of the string. You may be tempted to use malloc or realloc to dynamically allocate a buffer that can contain the entire file. But this would not be a good solution since if the file exceeds the amount of memory available, then your program may run out of memory. Instead, a better solution would use a limited fixed-sized buffer, but somehow take care of boundary conditions where a string spans two buffer reads.
Q) The output from my running strings -a on my Mac is different from mystrings. Am I doing something wrong?
A) Mac systems have a slightly different version of the strings program. The explanations on the worksheet apply to the strings program installed on Linux systems (the thoth machine). Please use the thoth machine when you compare your output to strings.
Q) My program works fine with text files but gives no output with binary files.
A) You are probably using the C library I/O functions designed for text files such as fgetc, getc, or getchar. Try reading the manpages for these functions and understand exactly what they do. For example, fgetc returns the char read from the file, or EOF if it has reached the end. Now, EOF is often defined as -1 (usually using #define). The problem occurs when you read a byte from your file and that byte happens to have the value -1. That would probably confuse your program. Text files would not have chars with -1 values because it is not a printable ASCII character, but binary files can have any bit sequence. The solution is to use generic I/O functions (such as fread) that do not have text file specific semantics.
Q) I accidentally found a password for a puzzle. Am I done?
A) No, you are not done. Some of the passwords are patterns of characters rather than a single word. You may have been lucky to find a string that matches that pattern. However, to get points, you have to identify exactly that pattern is and how you discovered that pattern.
Q) What tools are available for me?
A) Hex viewers like 'xxd' or 'od' or 'hexdump' may give you some marginal help but will not be extremely helpful unless you can read machine binary off the screen in Matrix fashion. 'Objdump' is a tool that interprets that binary for you and displays sections of it in human readable fashion. It is short for object file dump, which is exactly what it does. We have seen the usage of 'objdump' in the lectures. I suggest you try doing 'man objdump' to see it's full range of capabilities. Some of the options that may help you:
objdump -t: Shows the symbol table (just like the nm utility)
objdump -T: shows the symbol table for the dynamic linking stage (just like nm)
objdump -r: Shows the relocation table
objdump -R: Shows the relocation table for the dynamic linking stage
objdump -D: Disassembles the entire binary and shows assembly code
And of course you have GDB to step through the code. You will have to look at the disassembly provided by the disas command in GDB or by objdump -D to understand it as you step through the code.
Q) I'm at a loss of what to do.
A) Please go over the GDB puzzle lab carefully while referencing the stack / calling conventions slides. You should be able to understand what's going on by reading the assembly code. There are a few opcodes (e.g. cmp) that we did not discuss in class. Please refer to the Intel Instruction Set Manual posted on the website to look up opcodes and their behaviors. If you do have trouble understanding the lab, please ask the TA for guidance.
Q) Why do I get the following message on some puzzles: 'Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6_5.3.i686'?
A) You can ignore that message. It is just telling you that the binary has been stripped of debug information. That was done intentionally to make the puzzle more interesting. :)
Q) Some puzzles do not have the 'main' function. Where should I place my breakpoint?
A) We learned during the recitations that you can place breakpoints on instruction addresses as well as function name labels. The reason that the main function seemingly disappeared is not because code for the main function was not generated by the compiler, it's because the compiler decided to remove the 'main' label in the final assembly. When debug information is stripped from a binary, the compiler removes all information, including all labels, that are not strictly necessary for the functioning of the program. The code for the main function is somewhere in there. It is just not labeled as such.
Q) When I try to print a value of a register using x/s, sometimes I get something like the following:
(gdb) x/s $eax
0x8: <Address 0x8 out of bounds>
What would be the problem?
A) Please refer to the first GDB lab and the usage of 'print' and 'x'. In order to display the value inside a register or a variable, you would use 'print'. In order to display the value stored at a location pointed to by a register, you would use 'x'. Specifically, 'x/s $eax' says print the value pointed to by the $eax register, interpreting it as a string. In the above example, $eax does not contain a pointer to a string location (a series of characters bounded by a null character), and that's the reason for the error. Most likely, you would need to use the 'print' command instead.
Q) Is there a way to step through instruction by instruction without using breakpoints?
A) Please review your first GDB lab. The 'ni' command does exactly that.
Q) On the third puzzle, when I place a breakpoint on an instruction and do 'disas', it says 'No function contains program counter for selected frame'. What did I do wrong?
A) By default the 'disas' command disassembles the code for the current function. The error is saying that there is no function associated with the program counter (since the main function has been stripped from the code). You can give two arguments to disas to specify the starting and ending address you want to disassemble. Try 'help disas'. Otherwise you can just view the output of objdump -D to track where you are as you step through the code using 'ni'.
Q) In puzzle 3, when I look at the assembly I see jumps or calls that look like the following: "ja 80484b5 <tolower@plt+0x12d>". What does this mean? Does it have anything to the tolower C library function?
A) No. What that instruction is saying is that it is jumping to address 80484b5 which happens to be 0x12d offset bytes away from the tolower function. The content inside the brackets is an attempt by the disassembler to express what the address means. It expresses it as an offset from the closest label which is usually the enclosing function name, which helps the reader understand the code. In this case, I have intentionally stripped most labels from the code, including the enclosing function label. There are still a few labels remaining that has to do with dynamic linking and tolower just happened to be the closest label to the jump target. So, in short, you should pay attention to the actual jump target address 80484b5 which is somewhere in your code and not the interpretation of the target which in this case is pretty meaningless. Now if the jump target was the actual label as in: <tolower@plt>, then it would be an actual call to the function tolower.