What gcc really does
 

In this document, I use the simple program below. I call it "myprogram.c". Some of the output from gcc and other programs is very long. To break it up, I have sometimes broken lines with a backslash (\) and a new line.

#include <math.h>
#include <stdio.h>

#define PI 3.1415926543

int main() {
printf("sin(pi) = %f\n", sin(PI));
printf("sin(pi/2) = %f\n", sin(PI/2));
exit(0);
}

To compile the program, and I need to link it with the math library, libm.sl. This accomplished with the -l flag.

gcc -o myprogram myprogram.c -lm

gcc itself doesn't do much of the work, rather it calls other utilities. You can see this process by giving gcc the -v option.

gcc -save-temps -v -o myprogram myprogram.c -lm

The output looks like the following. My comments are intersperced.

Reading specs from /afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66/specs
gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release)

Those two lines above are not terribly important right now.

The first command that gcc calls is cpp, the C pre-processor. It takes all of the #define, #ifdef, and #include lines and turns them into things that the actual compiler can use.

/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66/cpp \
-lang-c -v -undef -D__GNUC__=2 -D__GNUC_MINOR__=91 -Dhppa -Dhp9000s800 \
-D__hp9000s800 -Dhp9k8 -DPWB -Dhpux -Dunix -D__hppa__ -D__hp9000s800__ \
-D__hp9000s800 -D__hp9k8__ -D__PWB__ -D__hpux__ -D__unix__ -D__hppa \
-D__hp9000s800 -D__hp9k8 -D__PWB -D__hpux -D__unix -Asystem(unix) \
-Asystem(hpux) -Acpu(hppa) -Amachine(hppa) -D__hp9000s700 -D_PA_RISC1_1 \
-D_HPUX_SOURCE -D_HIUX_SOURCE myprogram.c /var/tmp/cc5zk54m.i

These lines are output from cpp, caused by the -v option to cpp, which is a result of the -v option being passed to gcc. The first #include line below states that when looking for files to satsify #include "something.h", it will look only in the current directory. The second #include line says that it will try to files to satisfy #include <something.h> in the three directories it lists below.

GNU CPP version egcs-2.91.66 19990314 (egcs-1.1.2 release) (hppa)
#include "..." search starts here:
#include <...> search starts here:
/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/hppa1.1-hp-hpux10.20/include
/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66/include
/usr/include
End of search list.

At this point, all of the lines that started with #ifdef, #if, #include, #define, etc. are gone. The contents of all the #include files have been added to the the output, and macros (#define) have all been expanded. Lines that were between #if (or #ifdef or #ifndef) and #endif (or #else) will be deleted if they the condition was false. To see the pre-processor output see myprogram.i, which is left as a result of the -save-temps option to gcc.

Now the program that takes the preprocessed code and turns it into assembly language is called. This is really the compiler (gcc is just a pretty front-end).

/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66/cc1 \
myprogram.i -quiet -dumpbase myprogram.c -version -o myprogram.s
GNU C version egcs-2.91.66 19990314 (egcs-1.1.2 release) (hppa1.1-hp-hpux10.20) compiled by GNU C version egcs-2.90.29 980515 (egcs-1.0.3 release).

This should have created myprogram.s. Take a look at it if you would like to see what HPPA assembly code looks like.

Now the assembler is called to transform the assembly code into machine code. The machine code is put into an object (.o) file.

/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66/as -o myprogram.o myprogram.s

Finally, we are ready to linking stage. This is what takes the various object files (.o) archive files (.a) (also called static libraries), and shared libraries (.sl, or .so, depending on operating system) and makes them into an executable.

/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66/collect2 \
-L/lib/pa1.1 -L/usr/lib/pa1.1 -z -u main -o myprogram /lib/crt0.o \
-L/opt/fortran/lib -L/opt/CC/lib -L/opt/graphics/common/lib \
-L/opt/graphics/PEX5/lib -L/opt/graphics/phigs/lib \
-L/afs/engr.wisc.edu/src/gnu/std/lib -L/usr/lib/Motif1.2 \
-L/usr/lib/Motif1.1 \
-L/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib/gcc-lib/hppa1.1-hp-hpux10.20/egcs-2.91.66 \
-L/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/hppa1.1-hp-hpux10.20/lib \
-L/usr/ccs/bin -L/usr/ccs/lib \
-L/afs/.engr.wisc.edu/src/gnu/egcs-1.1.2/hp_ux102/lib \
myprogram.o -lm -lgcc -lc -lgcc

Each of the -L flags specifies a directory to look in for the needed libraries. The needed libraries are specified by the -l options. Note that toward the end of this command the optios are "myprogram.o -lm -lgcc -lc". This step will succeed only if all the symbols in all the .o files are found either in the .o files or in the libraries libm.sl, libgcc.sl, and libc.sl.

To see what symbols are needed, by myprogram.o, you can take a look at the output of nm:

% nm myprogram.o


Symbols from myprogram.o:

Name Value Scope Type Subspace

L$C0000 | 32|static|data |$LIT$
L$C0001 | 0|static|data |$LIT$
L$C0002 | 40|static|data |$LIT$
L$C0003 | 16|static|data |$LIT$
__main | |undef |code |
exit | |undef |code |
main | 0|extern|entry |$CODE$
printf | |undef |code |
sin | |undef |code |

The symbols that have a scope of "undef" need to be found elsewhere. If you are looking for a library that has a particular symbol, you can find it with nm as well.

% cd /usr/lib
% nm -g -r *.sl | grep ':printf .*\$CODE\$$'
libc.sl:printf | 1129232|extern|code |$CODE$
libc_r.sl:printf | 1129232|extern|code |$CODE$

So this tells me that if I want to use printf, I must be sure that collect2 uses -lc, (or -lc_r if the re-entrant version is needed) and -L/usr/lib/.