Compiling a Program

C++, Visual Basic, and other programming languages are English-like; a computer is an electro-magnetic machine that doesn't directly understand either English or English-like programming languages. In order for a computer to "execute" or "run" a program, the program must be translated into the electro-magnetic switches that the computer "understands." There are several ways of doing this, including assembling, interpreting, and compiling.

Assembly languages are formed by one English-like code for every instruction of the machine's native language, "machine code." Translation of a program written in an assembly language into machine code is called "assembling" the program. Because assembly language instructions map (via assembling) directly to machine instructions, the resulting machine code is likely to be extremely efficient -- runs fast and conserves memory. These are the primary advantages of assembly language programming. Disadvantages: it tends to be hard to work with, and, since every model of computer has its own version of assembly language, programs aren't "portable" -- they must be rewritten in order to run on other models of computers. For example, a program written in IBM PC assembly language should run on all IBM PC-compatible computers, but would not on Macintoshes in the days when PCs and Macs were incompatible.

"High-level languages" such as C++ and Visual Basic, tend to be easier for programmers to work with than assembly languages since an instruction in a high-level language typically is the equivalent of several assembly or machine language instructions. High-level language programs are generally either interpreted or compiled into machine code.

Before 2000, BASIC (not Visual Basic, although Visual Basic has roots in BASIC), was a popular programming language, often learned by pre-college students. BASIC was usually interpreted. When an interpreted program runs and control within the program reaches a given instruction (such as a LET statement), the interpreter translates the instruction into machine code and executes the corresponding machine instructions, then goes on to the next instruction of the program WITHOUT REMEMBERING the machine code it has just finished executing. Thus, if an instruction is repeated, it must be re-translated to machine code before it can be re-executed.

By contrast, C++ is usually compiled. The compiler translates the program IN ITS ENTIRETY to machine code, and "remembers" all the machine code by saving it in an "executable" file, before the program is run. Thus, when the program is run, translation of source code instructions to machine code has been done in advance; repetition of a source code instruction does not require re-translation of the source code instruction to machine code; hence the program tends to run faster. On some computers where a language can be both compiled and interpreted, the compiled version will run enormously faster -- it's not unusual for the parts of the program not requiring input or output to require 1000 times as much time in the interpreted as in the compiled version.

OK, this clearly indicates that an important advantage of compiling as opposed to interpreting is speed of execution. What could a disadvantage of compiling be? Well, remember we said above that one high-level language instruction is typically equivalent to several machine code instructions. Suppose you are running a program in, say, BASIC, that has 500 BASIC instructions, the average of which is equivalent to 5 machine code instructions and the largest of which is equivalent to 10 machine code instructions. If compiled, this would mean you would need enough memory to hold 5*500 = 2500 machine code instructions in order to run the program; if interpreted, you need enough memory to hold 500 BASIC instructions plus at most 10 machine code instructions (for whatever BASIC instruction is currently executing). Thus, interpreting uses less memory. This was an extremely important factor for the early microcomputers -- machines with small memories -- and probably explains why they typically came equipped with interpreted BASIC.

A compiler typically scans your source code file for "syntax errors," i.e., errors in the use of the programming language's rules concerning formation of statements in the program. Frequently, one actual error will generate several error messages, as the first error may "confuse" the poor compiler. Hence, the first error message is significant; others may or may not be.

If the compiler does not find "serious" syntax errors in your program, it will also typically (the details differ from one compiler to another) produce a file that is either the "executable" version of your program or an almost-executable (it may be an "object" file that must be "linked" to produce an executable file). The executable file is the machine code translation of your source code that the computer uses to execute your program when you give the command to do so. Students should note that the goal of the process is not merely to have your program run. It should run correctly and well. It may be necessary for you to observe your program's behavior, edit its source code, compile again, and run again, many times, before your program performs properly.

On the other hand, if the compiler finds errors in your program, it is necessary for you to edit these errors out of your program (in the editor). You must then re-compile. The cycle of editing and compiling often must be repeated several times before your program is ready to run. Notice also that you're probably not "done" when your program runs the first time - successful compiling does not guarantee either correct logic (necessary to get correct "answers") or nice input/output. It's the programmer's responsibility to check these matters.