In this part of the project you will implement a scanner for the C- language. The scanner will read in a source file and produces a listing of each token one by one, annotated with the token kind (identifier, integer number, etc) and the location in the source. If invalid input is discovered, it will stop producing the error token annotated with the lexeme incorrectly recognized.
Before you start your scanner implementation, I recommend reading attentively Chapter 3 from the book "Introduction to Compilers and Language Design" by Douglas Thain. While our language syntax is a little bit different from the one used in the book, the code examples and material can be extremely helpful.
You're using Flex, you can also refer to the following material:
In your lexer implementation, consider the following classes of tokens in the language:
ID Identifier
NUM Literal decimal (integer)
KEY Keyword
SYM Lexical Symbol
ERROR Lexeme of the first error found
Comments and white space should be discarded during the lexical analysis phase of the compiler. You should also print the line number where the token was found, including the error token. The lexical analyzer must stop after finding the first lexical error. In the case of an unfinished comment error, the lexeme "/*" must be used for the ERROR token. The output format is:
(line_num,token_type,"lexeme")
Example of input file (main.c):
void main(void)
{
int a;
a = 4 + 5;
}
How to run it (two arguments: input and output)
The program should read the input from a file (source) and write the output to another file (target):
$ ./lexer main.c main.lex
Example of output file generated by the lexer (main.lex)
(1,KEY,"void")
(1,ID,"main")
(1,SYM,"(")
(1,KEY,"void")
(1,SYM,")")
(2,SYM,"{")
(3,KEY,"int")
(3,ID,"a")
(3,SYM,";")
(4,ID,"a")
(4,SYM,"=")
(4,NUM,"4")
(4,SYM,"+")
(4,NUM,"5")
(4,SYM,";")
(5,SYM,"}")
Make sure you also create two files in your submission folder: compile.sh and run.sh to compile and run your code, such as:
compile.sh (if you are using the Flex tool)
flex lexer.lex
gcc -o lexer lex.yy.c -ll
run.sh (takes two arguments as filenames)
./lexer $1 $2
Compiler binary (Linux ELF 64-bit LSB executable, x86-64) for the lexical phase can be found here
If you're using Windows, you can install and use the Linux Ubuntu terminal! See this tutorial
Test cases here
IMPORTANT NOTE on Line Endings: Text files created on DOS/Windows machines have different line endings than files created on Unix/Linux. DOS uses carriage return and line feed ("\r\n") as a line ending, which Unix uses just line feed ("\n"). You need to be careful about transferring files between Windows machines and Unix machines to make sure the line endings are translated properly.