Lexical Elements

At lexical level, a program is a sequence of characters that are decomposed into a sequence of tokens. Each token is a finite sequence of characters separated by white spaces, and separators. Comments can be added on program code to provide additional information for human readers as well as language processors.

White Spaces

White spaces include the space ⊔, carriage return \r, line feed \n, tab \t, and form feed \f. In EBNF, white spaces are defined as follows.

White Spaces ::={ "⊔" | "\t" | "\n" | "\r" | "\f" }+

White spaces play no functional role in the program but to separate tokens. They are skipped and ignored by compilers, but normally used by program text editors to present code in a more readable form.

Tokens

There are five types of tokens in CAOPLE language: identifiers, atomic literals, keywords, operators and separators.

Identifiers

An identifier is an arbitrarily long non-empty sequence of letters, digits and underscores that must not begin with a digit. Letters in upper and lower cases are distinct in identifiers. Keywords cannot be used as identifiers. In EBNF:

Identifier ::= Letter { Letter | Digit } ∼ Keyword

Letter ::= "a" - "z" | "A" - "Z" | "_"

Digit ::= "0" - "9"

For example, the following are valid identifiers.

Name, Location_1, City_Address, StoneAge, controlVariable, x1

Identifiers are used to name and refer to castes, actions, data types, fields in record data types, variables, constants and parameters.

Atomic Literals

There are five types of atomic literals.

• Integer literals are numerical values of integer numbers.

• Real literals are numerical values of real numbers.

• String literals are values of string type. In particular, "" is the empty string. Within a string, the single quote and double quote symbols are represented using the escape symbol \. The escape symbol itself is represented by \\.

• Null indicates that an element in structured data type or a variable or parameter has no value.

• True and False are literals of the Boolean data type.

The definition in EBNF is given below.

IntLiteral ::= ("1" - "9" ) { "0" -"9" }

RealLiteral ::= { "0" - "9" } "." { "0" - "9" }

StringLiteral ::= """ { "\n" | "\t" | "\b" | "\r" | "\f" | "\\" | "\′" | "\”"

| ("0"-"7")["0"-"7"] | ("0"-"3")("0"-"7")("0"-"7")} """

BoolLiteral ::=⟨TRUE⟩ | ⟨FALSE⟩

Null ::=⟨NULL⟩

Operators

CAOPLE uses the operators listed in Table 2.1.

Table 2.1. List of Operators

Separators

Separators used in CAOPLE are given below:.

( ) { } < > , ; . # \ |

Keywords

A keyword consists of a sequence of letters.

CAOPLE language is case sensitive in the sense that the upper case and lower case of a letter are regarded as different. To relax this case sensitivity and also to enable different coding styles through various naming conventions, CAOPLE allows a keyword to have a number of different appearances in a program.

For example, the following are regarded as the same keyword in CAOPLE:

Body, body, BODY

However, "boDy" and "bODy" etc. are not keywords, thus they are different from the keyword "body".

The keywords used in CAOPLE are listed in Table 2.2.

Table 2.2 List of Keywords

Comments

There are two ways that comments can be introduced into a program code.

The characters "/*" introduce a comment, and the comment is terminated with the characters "*/" . Comments do nest.
The characters "//" also introduce a comment and that comment is terminated by the end of the line.

The syntax definition in EBNF is given below.

SingleLineComment ::= "//" (String ∼{ "\n" | "\r" } ) ("\n" | "\r" | "\r\n" )

FormalComment ::="/**" (⟨String⟩∼"*/" ) "*/"

MultiLineComment ::="/*" (⟨String⟩∼"*/" ) "*/"

Comments are ignored by the compiler, but they are useful to improve program readability. They are also often used by other language processors, for example for generating documents.