ParserGen

A Parser Generator for the BlitzMax language

Download - Top

!!! ParserGen has been deprecated, in preference of Coocoo.
!!! Though most of the information pertaining to grammar syntax is still valid for Coocoo.
!!! v1.1 & v1.2 might  be broken for anything other than parsing Coocoo's grammar :(

If you have any improvements or bug fixes to this piece of software, please share them with me, send me an Email.
Downloads:
    
ParserGen v1.2: Binary & Source !!!
   
ParserGen v1.1: Binary & Source !!!
    ParserGen v1.0: Binary | Source + bootstrap binary

Description - Top

ParserGen takes a Coco/R like grammar as input, and generates as output the source for a parser in the BlitzMax language.

If you have never used a parser generator like Lex/Yacc or Coco/R then i suggest reading up on it, specifically Coco/R as ParserGens grammar is very similar.

NOTE: This is not a full blown generator. it does not generate a lexer
or have funky optimizations.

So instead of defining your character tokens you have to use those that are predefined.

Also white-space is automatically skipped and cannot be matched. (SPACE,TAB,CR,LF)

The scanner/lexer template type lies in ParserTemplate.bmx, and is where all support functions is. You can replace it for other purposes, or override its methods for added functionality.

Grammar syntax - Top

// single line comment
/*
multi line comment
*/
SINGLE_COMMENT = single line comment string
MULTI_COMMENT = multi comment start string TO multi comment end string
STRING_QUOTE = string quote character [ second string quote character ]
CASE_SENSITIVE = TRUE or FALSE
GLOBAL
    // blitzmax code, inserted into global scope in the resulting file
END_GLOBAL
CODE
    // blitzmax code, inserted into type scope of the generated parser
END_CODE
PRODUCTIONS
    // grammar productions

these are optional:
    SINGLE_COMMENT
    MULTI_COMMENT
    STRING_QUOTE default="
    CASE_SENSITIVE default=TRUE
    GLOBAL / END_GLOBAL
    CODE / END_CODE

if they are not supplied their predefined productions will not work, except for string.

Production syntax - Top

ProductionName [ <parameters> ] [ (. init code .) ] = { Production } .

everything between ( and ) are matched as a group
everything between [ and ] are marked as optional
everything between { and } are looped over, 0 or more times
everything between (. and .) is mapped before or after a production,
and can contain blitzmax code
everything between < and > is mapped as parameter to a production,
and can contain blitzmax code

And you have the | operator, which is like OR in most languages

examples:
( A B C ) matches A then B then C
( A | B | C ) matches A or B or C
[ A ] B matches A if present then B
A { B } C matches A then loops B until C

there are also predefined productions:

ident = parses an identifier
number = parses an integer number
float = parses a floating point number
string = parses a quoted string
SKIP<END> = skips characters until END is found

The predefined productions leaves their result in "LexString"

NOTE: since there is only 1 char look-ahead certain productions must be
used in a specific order. feks

float before number (since they both start with number 0..9)
the same applies to your own productions that have similar input.

NOTE: the first production's name is also the name of the generated
parser, and it cannot have any input parameters.

Example grammar - Top

/*
this grammar parses expressions
*/
PRODUCTIONS
Expression = AddExpr .
AddExpr = MulExpr { "+" MulExpr | "-" MulExpr } .
MulExpr = Primary { "*" Primary | "/" Primary } .
Primary = "(" Expression ")" | float | number .

Example grammar 2 - Top

/*
this is the same grammar as above, except now
we use the (. .) and < > to actually make it do something ;)
*/
SINGLE_COMMENT = "//"
MULTI_COMMENT = "/*" TO "*/"
STRING_QUOTE = '"' "'"
CASE_SENSITIVE = TRUE
CODE
Field Result:Float ' this is where the final result ends up
END_CODE
PRODUCTIONS
Expression = AddExpr<Result> .
AddExpr<val:Float Var> (. Local b:Float .) =
MulExpr<val>
{
"+" MulExpr<b> (. val = val + b .)
| "-" MulExpr<b> (. val = val - b .)
}
.
MulExpr<val:Float Var> (. Local b:Float .) =
Primary<val>
{
"*" Primary<b> (. val = val * b .)
| "/" Primary<b> (. val = val / b .)
}
.
Primary<val:Float Var> =
"(" AddExpr<val> ")"
| ( float | number ) (. val = LexString.ToFloat() .)
.

JavaScript grammar - Top

/*
this is a grammar for JavaScript, done from memory
so it may not be correct.
*/
SINGLE_COMMENT = "//"
MULTI_COMMENT = "/*" TO "*/"
STRING_QUOTE = "~q" '"'
CASE_SENSITIVE = TRUE
PRODUCTIONS
JavaScript =
{ FunctionDef | VariableDef | FunctionCall | AssignStm }
.
ParamList =
ident { "," ident }
.
ExprList =
Expression { "," Expression }
.
VarList =
ident [ "=" Expression ]
{
"," ident [ "=" Expression ]
}
.
VariableDef =
"var" VarList
.
FunctionDef =
"function" ident "(" [ ParamList ] ")" Block
.
Block =
"{" { Statement } "}"
.
Statement =
( VariableDef | ReturnStm | IfStm | WhileStm | ForStm
| FunctionCall | AssignStm ) ";"
.
FunctionCall =
ident "(" [ ExprList ] ")"
.
AssignStm =
ident [ "[" Expression "]" ] "=" Expression
.
ReturnStm =
"return" [ Expression ]
.
IfStm =
"if" "(" Expression ")" ( Block | Statement )
[ "else" ( Block | Statement ) ]
.
WhileStm =
"while" "(" Expression ")" ( Block | Statement )
.
ForStm =
"for" "(" [ VariableDef | Expression ]
";" [ Expression ]
";" [ Expression ]
")" ( Block | Statement )
.
Expression =
BoolExpr
.
BoolExpr =
EqualExpr { "&&" EqualExpr | "||" EqualExpr }
.
EqualExpr =
AddExpr
{
"==" AddExpr
| "!=" AddExpr
| "<=" AddExpr
| ">=" AddExpr
| "<" AddExpr
| ">" AddExpr
}
.
AddExpr =
MulExpr { "+" MulExpr | "-" MulExpr }
.
MulExpr =
Primary { "*" Primary | "/" Primary }
.
Primary =
"(" Expression ")"
| "[" ExprList "]"
| "false"
| "true"
| ObjectValue
| FunctionValue
| number
| float
| string
| ident
[
"(" [ ExprList ] ")"
| "[" Expression "]"
]
| "-" Expression
| "!" Expression
.
FunctionValue =
"function" "(" [ ParamList ] ")" Block
.
ObjectField =
ident ":" Expression
.
ObjectValue =
"{" ObjectField { "," ObjectField } "}"
.


Example2 parser output - Top

Rem
*** Expression Parser ***
This file is auto-generated by ParserGen, any changes made
	will be overwritten!
EndRem

SuperStrict

Import "ParserTemplate.bmx"

Type TExpressionParser Extends TParser
'*** CODE BEGIN ***
Field Result:Float ' this is where the final result ends up
'*** CODE END ***
Method New()
CommentSingle = "//"
CommentMulti[0] = "/*"
CommentMulti[1] = "*/"
StringQuote[0] = ""
StringQuote[1] = ""
CaseSensitive = True
EndMethod

Method Parse:Int( s:String)
Initialize()
Source = s + "~n"
_Expression()
Finalize()
Return (Not Error)
EndMethod

Method _Expression()
SkipWhite()
If Error Then Return
_AddExpr(Result)
SkipWhite()
EndMethod

Method _AddExpr(val:Float Var)
SkipWhite()
If Error Then Return
Local b:Float
_MulExpr(val)
While (Pos < Source.Length) And ((Source[Pos] = Asc("+")) Or ..
			(Source[Pos] = Asc("-")))
If (Source[Pos] = Asc("+")) Then
ExpectChars("+")
_MulExpr(b)
val = val + b
ElseIf (Source[Pos] = Asc("-")) Then
ExpectChars("-")
_MulExpr(b)
val = val - b
Else
ReportError( "expected ~q+~q Or ~q-~q")
EndIf
Wend
SkipWhite()
EndMethod

Method _MulExpr(val:Float Var)
SkipWhite()
If Error Then Return
Local b:Float
_Primary(val)
While (Pos < Source.Length) And ((Source[Pos] = Asc("*")) Or ..
			(Source[Pos] = Asc("/")))
If (Source[Pos] = Asc("*")) Then
ExpectChars("*")
_Primary(b)
val = val * b
ElseIf (Source[Pos] = Asc("/")) Then
ExpectChars("/")
_Primary(b)
val = val / b
Else
ReportError( "expected ~q*~q Or ~q/~q")
EndIf
Wend
SkipWhite()
EndMethod

Method _Primary(val:Float Var)
SkipWhite()
If Error Then Return
If (Source[Pos] = Asc("(")) Then
ExpectChars("(")
_AddExpr(val)
ExpectChars(")")
ElseIf (LookFloat() Or ((Source[Pos] >= Asc("0")) And ..
			(Source[Pos] <= Asc("9")))) Then
If LookFloat() Then
EatFloat()
ElseIf ((Source[Pos] >= Asc("0")) And ..
				(Source[Pos] <= Asc("9"))) Then
EatNumber()
Else
ReportError( "expected float Or number")
EndIf
val = LexString.ToFloat()
Else
ReportError( "expected ~q(~q Or float Or number")
EndIf
SkipWhite()
EndMethod
EndType

Comments