Computing - Translation of source code to machine code

Translation of source code to machine code

Computers can only read binary (machine language) and while humans can read binary, it is very inefficient. Thus, humans program in a higher-level language in which they can easily read and understand. But as computers cannot understand this, they need to be translated back into machine language to perform anything useful.

When learning how to program, your source code is usually automatically translated by an integrated development environment (IDE) and hides the process of translating from source code to machine code.

At a low-level, processors can perform basic math operations with numbers they store in memory.

But they can perform more useful functions by being programmed. But, how exactly is a CPU programmed?

Difference between compiler and interpreters:

Compilers translate source language (high-level code) into machine code; compilers convert the program holistically whereas interpreters translate line by line
Compilers translate the source language before the execution of the program, and interpreters simultaneously translate and execute the source code
Compilers generally run faster than interpreters

Compilers

A compiler is a program with the ability to ‘compile’ code; that is, it can translate high-level source code written in languages like (C, C++, Java, Perl). Compilation is the process of translating source code, which we use to create programs, such as C# and Java, into machine code, which the machine can read, such as binary (consisting of 0’s and 1’s).

Native compilers - compiles code native to its language, and delivered in bytecode.
Cross compilers - able to compile source code to multiple platforms
Source to source compilers - able to take in high level source code as input, then output back high level source code.

A compiler is a program where source code is transformed into machine code, ready to be executed by the program.

Languages that use compilers:

C
C++
C#
COBOL

Advantages of using a compiler:

Efficient and quicker
Increased performance
Hard to revert back to source code

Disadvantages of using a compiler:

Platform specific
Longer compilation times if more code is present
To compile for another OS, will need to recompile

Interpreters

An interpreter converts high level source code into machine code or into an intermediate level language that is easy to execute. In contrast with compilers, the source code is translated line-by-line and is executed simultaneously, making the process significantly slower. Because of this, interpreters are often used in debugging and training, areas where speed does not matter. However, interpreters do not generate intermediate object code, hence, taking up less memory in the process.

One main advantage of using an interpreter is platform independence. This means that the program can be executed regardless of operating system without requiring any modification. For example, you can compile a file into Java byte code and then interpret the same file in a different machine archetype through Java Virtual Machine.

Other advantages include:

Dynamic scoping
Dynamic typing

There are several types of interpreters that work in different ways:

Abstract Syntax Tree (AST) interpreters
Bytecode Interpreters
Threaded code interpreters
Just-in-time interpreters
Self-interpreters

Fig.1

Abstract Syntax Tree

For example, Abstract Syntax Tree interpreters transforms the code into an AST, then executes the code following this structure.

Interpreters are more suitable for smaller models, where speed does not matter, whereas larger models would use compilers to meet memory-intensive needs. Both interpreters and compilers have their advantages and disadvantages, however, most software environments employ both to fit their needs.

Languages that use interpreters:

Matrix Laboratory (MATLAB)
Perl
Python
R
Ruby

Links:

https://en.wikipedia.org/wiki/Abstract_syntax_tree (Fig.1)

https://www.tutorialspoint.com/compiler_design/compiler_design_phases_of_compiler.htm

https://www.microcontrollertips.com/compilers-translators-interpreters-assemblers-faq/

https://tomassetti.me/difference-between-compiler-interpreter/

https://www.techopedia.com/definition/7793/interpreter

https://en.wikipedia.org/wiki/Interpreter_(computing)#Bytecode_interpreters

https://www.programiz.com/article/difference-compiler-interpreter

Translation from Source Code to Machine Code

Stages of translation: Preprocessing; Compilation; Machine code generation; Linking stage

The preprocessing stage deals with the cutting of source code from one file, and pasting it to another file.

The compilation stage deals with the generation of assembly code, which itself contains many stages.

When the assembly code is generated, the machine code generation stage is initiated in which the produced assembly code is finally translated into binary that can be understood and executed by the CPU.

Finally, the linking stage is taking all parts of the source code and linking it together into one executable file.

The way in which compilers translate high-level languages into machine language can be broken down into 4 steps beginning with lexical analysis.

Lexical analysis - the process of lexing takes the text of the high-level language and breaks it down into “lexemes” and then maps these lexemes into tokens such as keywords, identifiers, literals, operators, special symbols, and constants; but before these are broken down. Tokens are a pair consisting of a token name and token value. Lexemes which are a sequence of alphanumeric characters from the source code to match the pattern of the token, and is identified by the lexical analyser to be the instantiation of that token. Moreover, these stream of tokens will be fed into a syntax analyser.
Parsing/Syntactical analysis - convert the sequence of tokens into the parse tree - a data structure that organises these tokens into hierarchical structures, analogous to figuring out what the grammar is of the source code
Contextualise - The compiler then records context about the program, including the variable name and functions names
Finally, it then converts all of these outputs into binary that can instruct the CPU on how to function
Optimisation - evaluate constant expressions, optimise away unused variables or unreachable code, unroll loops

Past HSC Questions

2018

2017

Answers:

2018 Q29:

Lexical analysis: The statement of the source code is scanned and the lexemes Total, =, number1, +, number2 are identified and each allocated a unique token.

Syntactical analysis: This tokenised stream is now scanned against the relevant syntax rules. Because the statement begins with a variable, it must be an assignment statement, for which a valid rule is: <Variable> = < variable > < operator> < variable> It then matches each of the tokens to this structure, with total, number1, number2 being variables, and + being an operator.

Code generation: The machine code statements are generated by converting the tokenised stream to equivalent machine code statements, making use of the available operations in the instruction set of the computer.

2017 Q13: C

Page updated

Report abuse