Encoding

The first concept we need to understand for writing a chess engine is ENCODING.

Actually, the concept of encoding is so basic that we should know it if we want to program anything.

Encoding is just the process of representing a particular piece of information in some format.

It describes a particular format of information.

Actually, this process of encoding is performed in many of our day - to - day activities.

All languages used for communication (speaking and listening) is nothing but encoding.

E.g. in English we say "hello" to greet someone. In some other language we say "holaa". This is nothing but encoding. We use a particular word to represent our feeling of happiness and welcoming to greet the person.

There is even symbolic encoding for communication through Reading and Writing.

E.g.The symbol “1” represents the numerical value of 1 or the sound of vowel “a” is represented by the symbol “a”.

Now lets come to the real point where this concept is used for chess engine programming.

Firstly, we should know that - For computer programming, all encoding is done in a binary numerical form.

E.g. there is something known as ASCII code for all characters and digits.

This is because a computer processor basically represents everything as electronic switches.

There can only be 2 states of the switch - 'ON' or 'OFF'. Either one of them is binary digit '0' and the other is binary digit '1' .

Higher level languages put this numerical encoding under the hood and we can’t see it but it is prominent in lower level languages.

Thus, if we have to write a chess playing program, we have to encode all data related to a game of chess in some format which is understandable by the computer.

There are many programming languages which basically translate what we want the computer to do into something that computer understands.

Thus we have to encode chess data in the format of the programming language.

What is this data related to a chess game?

1. Chess pieces: There are 6 different types of chess pieces:

a. Pawn

b. Knight

c. Bishop

d. Rook

e. Queen

f. King

Each piece type is to be encoded in some form for our chess engine.

2. Chess Colour: There are 2 different types of colour involved in chess game:

a. White

b. Black

Each colour needs to be encoded.

Chess pieces exist in both of these colours which is also used to indicate the side_to_move.

Thus colour is used to show belonging.

E.g. All white coloured pieces BELONG to the player who handles the white side. When it is his turn to move he is eligible for moving white pieces only.

3. Chess Board: The chess board is the information about piece placement on the 64 squared board. It provides information as to what piece is placed on what square.

We can imagine it to be an 8x8 array of squares.

That is how we perceive the board - as a collection of 64 squares of alternate colours which can be empty or occupied by a particular type of piece.

Thus all squares will have some form of representation (i.e. encoding) and the square will have some "status" as a property.

This "status" will represent the actual state of how a particular square on the chessboard exists - empty or occupied and by what type of piece.

This is called as square-centric representation of the board (Followed in TSCP).

4. Chess Position Parameters: There are many aspects related to a chess position other than "piece placement + side_to_move".

These include:

a. castling permissions - whether white / black is allowed to castle kingside / queenside

b. fifty move rule tracker - A chess game is draw if no pawn move or capture is made for 50 consecutive moves

c. full moves tracker (for repetition draw)

d. en-passant square

These other parameters of a chess position are also to be encoded.

A variable for each can be used.

Also a variable can be used to track whose turn it is to move.

While playing the game, we just keep these parameters in mind and they are not physically visible anywhere.

This entire encoding scheme of a chess position (which mainly comprises of piece placement and other variables) is called Board Representation.

To fully understand what parameters are required in a chess position data-set, we should understand the FEN (Forsyth Edwards Notation) format to represent chess positions. FEN format is nothing but representation of a chess position in text format (string).

Refer the wikipedia link - FEN from wikipedia

Finally we need to have an encoding scheme to represent chess moves as well!

5. Chess Moves: Chess Moves should basically contain

a. 'from' square

b. 'to' square

With these 2 parameters however we cannot represent any move uniquely. This is because of pawn promotions - a pawn promotion move will have the same "from" and "to" square but it can be promoted to any of the 4 types of pieces. Hence, for pawn promotions, the type of piece that the pawn is promoted to is also required.

To completely understand about chess move data-set refer to long algebraic notation for chess moves.

Algebriac Notation from wikipedia

And all this is just the basics! Phew!

Only after encoding all this stuff we can go any further.

After reading this we understand WHAT needs to be encoded. However, to answer the question HOW it needs to be done, we need to understand further, the algorithms that will be required by a chess engine. Then the encoding needs to be done by the basic principle of "Data Structures and Algorithms".