Lesson 2

Lexical Analysis and Syntax Analysis

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers.

Token

A token is a categorized block of text. The block of text corresponding to the token is known as a lexeme. A lexical analyzer divides a sequence of characters into tokens (tokenization) and categorizes them according to function, giving them meaning. A token can look like anything; it just needs to be a useful part of the structured text.

A lexical analyzer generally does nothing with combinations of tokens, a task left for a parser. For example, a typical lexical analyzer separates out individual parenthesis, but does nothing to ensure they are balanced, or to group the tokens within them into expressions.

e.g. in VB:

dim sum as integer

sum = 3 + 2

dim , as integer are all KEYWORDS

sum is an IDENTIFIER e.g. a variable name

= and + are OPERATORS (assignment and addition)

3 and 2 are just stored as NUMBERS

The tokens are generated, usually white spaces are ignored (so sum = 3 + 2 is the same as sum = 3+ 2). The code is now ready to be parsed by the syntax analyzer.

Syntax Analysis

In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.

Page updated

Google Sites

Report abuse