In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers.
Token
A token is a categorized block of text. The block of text corresponding to the token is known as a lexeme. A lexical analyzer divides a sequence of characters into tokens (tokenization) and categorizes them according to function, giving them meaning. A token can look like anything; it just needs to be a useful part of the structured text.
A lexical analyzer generally does nothing with combinations of tokens, a task left for a parser. For example, a typical lexical analyzer separates out individual parenthesis, but does nothing to ensure they are balanced, or to group the tokens within them into expressions.
e.g. in VB:
dim sum as integer
sum = 3 + 2
dim , as integer are all KEYWORDS
sum is an IDENTIFIER e.g. a variable name
= and + are OPERATORS (assignment and addition)
3 and 2 are just stored as NUMBERS
The tokens are generated, usually white spaces are ignored (so sum = 3 + 2 is the same as sum = 3+ 2). The code is now ready to be parsed by the syntax analyzer.
Syntax Analysis
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.