In this part of the project, you will implement the phase of syntactic analysis for the C- language, including the construction of the Abstract Syntactic Tree (AST). The parser will read in a source file (in C-) and determines whether or not the program follows the language grammar specification, and in case of success it generates the AST as output.
Before you start, I recommend reviewing some background material from the following two chapters of the book "Introduction to Compilers and Language Design" by Douglas Thain.
If you're using Bison/Yacc, you should also refer to the following material:
There are several possible correct parse trees that can be generated for a program input. Thus, in our compiler project it is important to have a unique format for representing the code in the AST that contains a minimum number of nodes and is independent of any specific language implementation.
In our compiler project, the output of the parser will use a labelled bracketing notation as shown below. This notation is written in nested lists of prefix expressions and is equivalent to the representation by means of a tree structure. The prefix expressions correspond to the nodes in the AST.
[operator [operand1] ... [operandN]]
Recursively, each operand can be defined by another operator; for example,
[op1 [op2 [a] [b]] [c]]
where op1 has two operands: [op2 [a] [b]] and [c], and operator op2 has two other operands: [a] and [b].
As an example, the expression written 4 == (2 + 2) in the C- language, is represented as [== 4 [+ 2 2]] in the AST notation.
Below are the AST nodes and corresponding names that need to be produced by the parser:
[program ... ]
[var-declaration ... ]
[int] ---> only int is allowed, variables cannot has void type
[ID] ---> variable name
[\[\]] ---> (optional) symbol to describe a variable as array; IMPORTANT: the backslash \ symbol is used to not interpret [ or ] as bracket nodes, but to be seen as visible symbols in the AST.
[fun-declaration ... ]
[int] / [void] ---> either int or void type
[ID] ---> function name
[params ... ]
[param ... ] ---> (optional) parameter info
[int] / [void] ---> either int or void type
[ID] ---> variable name
[compound-stmt ... ] ---> (child options below)
[;] ---> either null statement
[selection-stmt ... ] ---> or IF statement
[EXPRESSION ---> recursive definition of any valid expression (binary expression, variable reference, function call, etc)
[compound-stmt ...] --> "then" (true) branch
. . .
[compound-stmt ... ] --> (optional) "else" (false) branch
. . .
[iteration-stmt ... ]
[EXPRESSION ---> recursive definition of any valid expression (binary expression, variable reference, function call, etc)
[compound-stmt ... ] --> loop block of statements
. . .
[return-stmt ... ]
[EXPRESSION ---> recursive definition of any valid expression (binary expression, variable reference, function call, etc)
[OP ... ] ---> operators of binary expressions OP = {+, -, *, /, <, <=, >, >=, ==, !=, =} (child options below)
[var ... ] ---> variable reference
[ID]
[NUM] ---> (optional) if array index
[NUM] ---> constant reference
[call ... ] ----> function call
[ID]
[args ... ] ----> function arguments
[var ... ] . . .
[NUM] .. .
[call ... ] . . .
[OP ... ] . . .
[OP ... ] ---> recursively another binary expression
. . .
How to run it (using two arguments: input and output)
The program must read input from a file (source) and write output to another file (target). In case of syntactic error, the contents of the output file must be empty (0 bytes).
$ ./parser main.c main.ast
Example of input file (main.c)
int g;
int foo(int x, int y, int z[])
{
z[0] = 0;
y = x * y + 2;
if(y == 0)
{
y = 1;
}
return y;
}
void main(void)
{
int a[10];
while(g < 10)
{
g = foo(g, 2, a);
;
}
}
Example of output file (main.ast)
Important: You do not have to worry about whitespaces in the output file, they will be ignored during the auto-grading process.
[program
[var-declaration [int] [g]]
[fun-declaration
[int]
[foo]
[params
[param [int] [x]]
[param [int] [y]]
[param [int] [z] [\[\]]]]
[compound-stmt
[= [var [z] [0]] [0]]
[= [var [y]]
[+
[* [var [x]] [var [y]]] [2]]]
[selection-stmt
[== [var [y]] [0]]
[compound-stmt
[= [var [y]] [1]]
]
]
[return-stmt [var [y]]]
]
]
[fun-declaration
[void]
[main]
[params]
[compound-stmt
[var-declaration [int] [a] [10]]
[iteration-stmt
[< [var [g]] [10]]
[compound-stmt
[= [var [g]]
[call
[foo]
[args [var [g]] [2] [var [a]]]
]]
[;]
]
]
]
]
]
Example of AST illustration (produced from https://yohasebe.com/rsyntaxtree/)
Reference binary (Linux ELF 64-bit LSB executable, x86-64) here
If you're using Windows, you can install and use the Linux Ubuntu terminal! See this tutorial
Test cases here
Make sure to include two bash files in your submission folder: compile.sh and run.sh to compile and run your code.
Example of compile.sh
gcc -o myparser <YOUR SOURCE CODE>
Example of run.sh (takes two arguments)
./myparser $1 $2