Documents‎ > ‎

Scanner

The Finite State Machine diagram below illustrates the states the scanner goes through while interpreting the ABC text.

Limbo
Header
Tune
Lyrics
Symbols
Chord
Grace
History
Text

 

   Dashed lines represents implicit transition back to the previous state. For example if one goes from state TUNE to state CHORD, at the end of the chord the scanner will go back to the state TUNE.
   The actual parser currently has also a state TEXT_PS that is entered when the %%beginps field is found. It is not reported here as it is completely equivalent to the TEXT state.

Limbo

The scanner starts in state LIMBO as it has not encountered any tune yet.
It will go back in state LIMBO as soon as an empty line is found and will stay there until a new tune is found.
In other words, LIMBO represents any text before the first tune, between two tunes or after the last tune.


 Token  Comment  Next State
T_TEXT A line of text (including newline)
 
T_FIELD A line starting with "x:" (where x is any character)
HEADER
T_INFIELD An inline field  "[x:....]" (where x is a letter different from 'r')
 
T_EXTFIELD A line starting with "%%"
HEADER
T_EMPTYLINE A line that contains a possibly empty sequence of spaces
 
T_PRAGMA A line starting with '#'
 
T_BEGINTEXT The field "%%begintext"
TEXT
T_BEGINHISTORY The field "H:"
HISTORY



Header

In the HEADER state we have encountered at least one field (T_FIELD or T_EXTFIELD) while in state LIMBO


Token Comment Next State
T_TEXT A line of comment (including newline) or a "[r:...]" field.
 
T_FIELD A line starting with "x:" (where x is any character)
 
T_INFIELD An inline field  "[x:....]" (where x is a letter different from 'r')
 
T_EXTFIELD A line starting with "%%"
 
T_EMPTYLINE A line that contains a possibly empty sequence of spaces
LIMBO
T_PRAGMA A line starting with '#'  
a TUNE token
Any of the token that would occur in the TUNE state
TUNE
T_BEGINTEXT The field "%%begintext"
TEXT
T_BEGINHISTORY The field "H:"
HISTORY


Tune

This is where the music actually is.

Token Comment Next State
T_TEXT A line of comment (including newline) or a "[r:...]" field.
 
T_FIELD A line starting with "x:" (where x is any character)
 
T_INFIELD An inline field  "[x:....]" (where x is a letter different from 'r')
 
T_EXTFIELD A line starting with "%%"
 
T_EMPTYLINE A line that contains a possibly empty sequence of spaces
LIMBO
T_PRAGMA A line starting with '#'  
T_ENDLINE The end of the text line that also indicated the end of the musical line  
T_CONTINUE The text line ended but it doesn't break the musical line. It is reported when the line ends with a backslash ('\')  
T_WHITESPACE A sequence of spaces (space or tabs)  
T_IGNORE A sequence of backticks (ASCII 96) i.e. spaces that should be ignored. as mentioned in section 4.7 of the 2.0 draft  
T_NOTE A note including accidentals and duration. Microtonal accidentals in the form "^2/3C" are supported for compatibility wity abcm2ps.  
T_REST A rest ("x","z" or "Z") including duration.  
T_SPACER A spacer ("y") including and optional dimension (for compatibiilty with abcm2ps).  
T_DECORATION A decoration in the form "+...+" or "!...!". Note that rolls ("~") and staccato (".") are not reported as decorations but as T_USERSYMBOL following the indications of section 4.14 of the 2.0 draft  
T_LYRICS_LINE Signify the beginning of a "w:" line. LYRICS
T_SYMBOLS_LINE Signify the beginning of a "s:" line. SYMBOLS
T_CHORD Signify the beginning of a chord "[...]". CHORD
T_GRACE Signify the beginning of a set of grace notes "{...}" or "{/...}". GRACE
T_BREAKLINE A forced breakline ("!")  
T_BAR One of the bars defined in section 4.8 of the 2.0 draft standard. Repetitions (i.e. a sequence of ':') are allowed before and after the combination of '[', '|' and ']'. Dotted bars are also allowed. As indicated in the draft many sequence are accepted for a bar (for example "[[|]]" is considered a single bar but not all (for example "[][|" is recgnized as two bars).  
T_ENDING An indication of variable ending ("[1" for example). For compatibility with abcm2ps these formats are also accepted: '[1-2,4' and '[ "infine"'  
T_BROKENRIGHT A sequenece of brokenrithm symbols (">")  
T_BROKENLEFT A sequenece of brokenrithm symbols ("<")  
T_SLURSTART The beginning of a slur. It may be dotted (".("). Forced direction as defined by abcm2ps are supported ("('" or "(,").  
T_SLUREND The end point of a slur (")")  
T_TIE The beginning of a tie. It may be dotted (".-"). Forced direction as defined by abcm2ps are supported ("-'" or "-,").  
T_TUPLET A generic tuplet in the form (p:q:r as described in section 4.13.  
T_USERSYMBOL One of the simbols that can be redefined ".~H-wh-w"(see section 4.16 and 4.14)  
T_GCHORD A "chord symbol" as described in section 4.18  
T_ANNOTATION A string to be added above, below or on the staff. Any string that is not a chord is reported as an annotation, whether the position symbols ("^_<>@") are there or not.  
T_OVLRESET A sequence of '&' to signify the reset of voice overlay (see section 7.4). As an extension more than one '&' is allowed.  
T_OVLSTART This is to support a proposal to explicitly mark the beginning of the voice overlay with "(&". It is also implemented in abcm2ps.  
T_OVLEND This is to support a proposal to explicitly mark the end of the voice overlay with "&)".  
T_UNKNOWN An unrecognized character. Note that this can be used to implement the reccomendations in section 8.1  
T_BEGINTEXT The field "%%begintext"
TEXT
T_BEGINHISTORY The field "H:"
HISTORY
T_MEASUREREPEATThe indication to repeat previous measure ("/",  "//" etc..)
 


Lyrics

A sequence of syllables contained in a line preceeded by "w:"

Token Comment Next State
T_TEXT A line of comment (including newline) or a "[r:...]" field.
 
T_FIELD A line starting with "x:" (where x is any character)
 
T_INFIELD An inline field  "[x:....]" (where x is a letter different from 'r')
 
T_EXTFIELD A line starting with "%%"
 
T_EMPTYLINE A line that contains a possibly empty sequence of spaces
LIMBO
T_PRAGMA A line starting with '#'  
T_ENDLINE The end of the text line that also indicated the end of the musical line TUNE
T_CONTINUE The text line ended but it doesn't break the musical line. It is reported when the line ends with a backslash ('\')  
T_WHITESPACE A sequence of spaces (space or tabs)  
T_IGNORE A sequence of backticks (ASCII 96) or the beginning of another "w:" field is encountered.
 
T_BREAKLINE A forced breakline ("!")  
T_BAR A single bar ("|")
 
T_SYLLABLE A syllable including any hold indication ("_"). The blank syllable ("*") is also reported here.
 
T_VERSE A number sepecifying a verse ("1." or "1:")
 
T_OVLRESET A sequence of '&' to signify the reset of voice overlay (see section 7.4). As an extension more than one '&' is allowed.  
T_OVLSTART This is to support a proposal to explicitly mark the beginning of the voice overlay with "(&". It is also implemented in abcm2ps.  
T_OVLEND This is to support a proposal to explicitly mark the end of the voice overlay with "&)".  
T_UNKNOWN An unrecognized character. Note that this can be used to implement the reccomendations in section 8.1  
T_BEGINTEXT The field "%%begintext"
TEXT
T_BEGINHISTORY The field "H:"
HISTORY

Symbols

A sequence of symbols contained in a line preceeded by "s:"

Token Comment Next State
T_TEXT A line of comment (including newline) or a "[r:...]" field.
 
T_FIELD A line starting with "x:" (where x is any character)
 
T_INFIELD An inline field  "[x:....]" (where x is a letter different from 'r')
 
T_EXTFIELD A line starting with "%%"
 
T_EMPTYLINE A line that contains a possibly empty sequence of spaces
LIMBO
T_PRAGMA A line starting with '#'  
T_ENDLINE The end of the text line that also indicated the end of the musical line TUNE
T_CONTINUE The text line ended but it doesn't break the musical line. It is reported when the line ends with a backslash ('\')  
T_WHITESPACE A sequence of spaces (space or tabs)  
T_IGNORE A sequence of backticks (ASCII 96) or the beginning of another "w:" field is encountered.
 
T_BREAKLINE A forced breakline ("!")  
T_BAR A single bar ("|")
 
T_GCHORD A "chord symbol" as described in section 4.18.
 
T_ANNOTATION A string to be added above, below or on the staff. Any string that is not a chord is reported as an annotation, whether the position symbols ("^_<>@") are there or not.  
T_DECORATION  A decoration in the form "+...+" or "!...!". Note that roll ("~") and staccato (".") are not reported as decorations but as T_USERSYMBOL following the indications of section 4.14 of the 2.0 draft  
T_USERSYMBOL One of the simbols that can be redefined ".~H-wh-w"(see section 4.16 and 4.14)  
T_OVLRESET A reset of voice overlay (see section 7.4). As an extension more than one '&' is allowed.  
T_OVLSTART This is to support a proposal to explicitly mark the beginning of the voice overlay with "(&". It is also implemented in abcm2ps.  
T_OVLEND This is to support a proposal to explicitly mark the end of the voice overlay with "&)".  
T_UNKNOWN An unrecognized character. Note that this can be used to implement the reccomendations in section 8.1  
T_BEGINTEXT The field "%%begintext"
TEXT
T_BEGINHISTORY The field "H:"
HISTORY

Chord


Token Comment Next State
T_ENDLINE The end of the text line shouldn't occur within a chord!
 
T_CONTINUE The text line ended but it doesn't break the musical line.
 
T_WHITESPACE A sequence of spaces (space or tabs)  
T_IGNORE A sequence of backticks (ASCII 96) i.e. spaces that should be ignored. as mentioned in section 4.7 of the 2.0 draft  
T_NOTE A note including accidentals and duration. Microtonal accidentals in the form "^2/3C" are supported for compatibility wity abcm2ps.  
T_DECORATION A decoration in the form "+...+" or "!...!". This is for compatibility with abcm2ps.
 
T_CHORDEND Signify the endof a chord "[...]". The scanner returns on its previous state that can be TUNE or GRACE.
TUNE or GRACE
T_USERSYMBOL One of the simbols that can be redefined ".~H-wh-w"(see section 4.16 and 4.14)  
T_SLURSTART The beginning of a slur. It may be dotted (".("). Forced direction as defined by abcm2ps are supported ("('" or "(,").  
T_SLUREND The end point of a slur (")")  
T_UNKNOWN An unrecognized character. Note that this can be used to implement the reccomendations in section 8.1  



Grace



Token Comment Next State
T_ENDLINE The end of the text line shouldn't occur within a chord!
 
T_CONTINUE The text line ended but it doesn't break the musical line.
 
T_WHITESPACE A sequence of spaces (space or tabs)  
T_IGNORE A sequence of backticks (ASCII 96) i.e. spaces that should be ignored. as mentioned in section 4.7 of the 2.0 draft  
T_NOTE A note including accidentals and duration. Microtonal accidentals in the form "^2/3C" are supported for compatibility wity abcm2ps.  
T_DECORATION A decoration in the form "+...+" or "!...!". This is for compatibility with abcm2ps.
 
T_GRACEEND Signify the end of grace notes.
TUNE
T_USERSYMBOL One of the simbols that can be redefined ".~H-wh-w"(see section 4.16 and 4.14)  
T_SLURSTART The beginning of a slur. It may be dotted (".("). Forced direction as defined by abcm2ps are supported ("('" or "(,").  
T_SLUREND The end point of a slur (")")  
T_CHORD Signify the beginning of a chord "[...]". CHORD
T_UNKNOWN An unrecognized character. Note that this can be used to implement the reccomendations in section 8.1  



History

This state handles the field "H:" as described in section 3.1.13. The "history" ends when a field is encountered. A field "H:" with no text can also used to mark the end.

Token Comment Next State
T_ENDHISTORY A line with only "H:". May contain spaces but not other text.  previous state
T_TEXT A text line. Lines starting with "H:" but containing text, are reported as T_TEXT tokens.
 
T_FIELD A line starting with "x:" (where x is any character)  previous state


Text

This state handles the extended field "%%begintext" as described in section 11.4.5. The text ends when a "%%endtext" field is encountered. Note that lines of text do not necessarely start with "%%" as there's an explicit mark for the end of text.


Token Comment Next State
T_ENDTEXT A line starting with "%%endtext". May contain other text.
 previous state
T_TEXT A text line. May start with "%%" but it's not mandatory.