Coding Techniques

Before we start with the real Board Representation in TSCP, we have to know about some basic techniques used usually in C programming language for fast processing.

For a better understanding, I'll have to explain something off topic.

There are many ways to encode any type of data depending upon the nature of data.

Consider a chess piece as an example – its data type – namely what kind of information a chess engine needs to have about a piece?

Well, the engine needs to know everything – the colour of piece, whether the piece is a pawn or king etc, in how many directions it can move and what are those directions.

All this is the nature of data. This is a lot of information about a chess piece. What options do I have to encode it?

To answer this we need to think in this direction:

There are 2 types of information that can be contained by a variable –

1. Finite set

2. Infinite set

E.g. if we consider a variable like “Customer_name” for a business software, we would know that the customer name can be anything (infinite set). Therefore we cannot hard code any information about it. We have to process everything on-the-fly. Like e.g. if the business owner decides to give a reward to all his customers whose name starts with ‘a’ then we have to process each and every customer name to check if it is starting with an ‘a’ or not.

However that is not the case with chess pieces (finite set). (Variables that hold information from a finite set are called enums in programming) If I consider a chess piece as an occupancy state of a particular square on board then during a chess game any of the 64 squares on a chess board can exist in only 13 states (12 pieces + 1 empty). That means either the square is occupied by any of the 12 types of pieces or it’s empty. It can’t be anything other than that.

So what's the difference?

If it’s a finite set, then we can encode the data as indexes to be used in an array of different types of its nature.

This method is called as the Lookup Method and is described further.

Lookup Method

Please refer the wikipedia link first to understand this concept (Wiki page: http://en.wikipedia.org/wiki/Lookup) The idea is that suppose you as a software developer already know that the user of your software will need a functionality where he would want multiplication of 2 numbers. Also, you already know that those 2 numbers will only be Integers between 0 & 5 (including 0 & 5). Now rather than always multiplying "Number A" and "Number B" you can have a PRE-CALCULATED table which already has the answer ready for all numbers that can be asked (this is called as the look up table) For this case, it will be a multiplication table in the form of a 6x6 array called "MULTIPLY[6][6]". This array will hold all the pre-calculated multiplication values of the 2 numbers according to its index

for(int i = 0; i < 6; i++)

{

for(int j = 0; j < 6; j++)

{

MULTIPLY[i][j] = i * j ;

}

}

Once we have this pre-calculated table ready, now if the user sends us Number A and Number B, then rather then multiplying the numbers

int answer = Number A * Number B ;

we just refer our "Lookup Table" and give him the answer as

int answer = MULTIPLY[Number A][Number B] ;

This might sound trivial for a simple multiplication operation. But suppose it involves time consuming operations like exponent, factorial, log etc then this might be actually worth considering!

Thus, In this method, all elements of the finite set are encoded as indexes to lookup arrays / tables.

We just hard-code all information in lookup arrays and all we need to do is "look up" in them to get any data using the index.

All lookup arrays are thus READ ONLY arrays.

They may be hardcoded initially or calculated only once in the "initialise" function.

Thus, for different types of chess pieces we will make arrays of different data types –all arrays having the same number of elements.

This method is actually used in TSCP for piece encoding.

Real example from code is given to explain.

In the source code (file defs.h), piece encoding and colour encoding is done based on this concept:

Colour Encoding:

#define LIGHT 0

#define DARK 1

Piece Encoding:

#define PAWN 0

#define KNIGHT 1

#define BISHOP 2

#define ROOK 3

#define QUEEN 4

#define KING 5

#define EMPTY 6

Refer to file data.c

There is a lookup array for the char values of the pieces.

/* the piece letters, for print_board() */

char piece_char[6] = {

'P', 'N', 'B', 'R', 'Q', 'K'

};

If the char value is to be accessed, then it is just looked up into that array with the index of piece.

E.g. The Rook is represented by the letter 'R'. Piece index of Rook is 3. So piece_char[3] = 'R'

Note: Common beginner practice is to encode chess piece as positive and negative numbers to distinguish between colour i.e. -1 to -6 for all black pieces, 1 to 6 for all white pieces and 0 for empty. If you want to use the lookup technique then you have to provide an extra padding equal to the most negative number - here it will be '6'.

This is because an array cannot have negative indexes.

There is another basic coding technique that is usually used in C programming language for fast processing. It is called bit-field method and it is described further.

Bit-field Method

The lookup method is good but memory intensive. It would not be good to hard-code EVERYTHING in memory. This is because if, and only if, the system takes less time to process the answer than to refer a particular memory location for the answer, then it would degrade the performance of our software (for a better understanding of this read something about CPU cache http://en.wikipedia.org/wiki/L1_cache ). Thus, we can use bit-field method in which the answer is processed on-the-fly and we can assume that the system will take negligible time to process data since the operations would only involve bit operations like ANDing, ORing, X-ORing etc which are the fastest for a processor

Refer the Wikipedia link about the method - Wikipedia page: http://en.wikipedia.org/wiki/Bit_field

The idea in this method is that we should already know that the basic int data type is a collection of 32 bits.

There are 32 bits from least significant bit (LSB - 1st bit) to the most significant bit (MSB - 32nd bit) from right to left.

LSB is the rightmost bit and MSB is the leftmost bit.

Thus e.g. the integer number 9 is actually 00000000000000000000000000001001 in binary.

But now we will look at this integer with a different perspective. We can basically see that it is nothing but a collection of 32 different binary digits each can have value '0' or '1'. Thus, it becomes like a collection of 32 boolean variables that can hold value 'true' or 'false'. Each bit is called as a flag which is either '1' (set) or '0' (reset).

It is better that we name such variables as flags so that the integer value is not confusing and we know that we are supposed to use bit operands on those.

int data types usually involve mathematical operands like '+','-','*' etc.

We can perform following bit operations on each status flag bit:

1. Status flag bit can be set by bit-wise ORing

2. Status flag bit can be reset by bitwise ANDing (also called masking)

3. Status flag bit can be checked by bit wise ANDing

4. Status flag bit can be toggled by bit wise XORing

The advantage of this method are:

1. More than 1 condition can be checked by ANDing in a single 'if' statement

2. More than 1 flag can be set with a single ORing statement

3. More than 1 flag can be reset with a single ANDing statement

The open source engine Fruit uses this method for almost all encoding.

TSCP uses this encoding for 2 data-sets:

1. Move Type

2. Castle flag

I'll give 1 example and explain Move Type variable.

Explanation for code Castle Flag will be done in further pages.

Move Type

TSCP uses a structure to represent a move.

Refer file defs.h. There is structure called move_bytes.

It has 4 char variables.

A char variable is 1 byte (8 bits) long and 4 char variables would be 4 bytes long = 32 bits which is the size of an int

The first 2 char variables 'from' and 'to' are the squares of a move - the from square and the to square.

The third char variable is 'promote'. It would be used for moves involving pawn promotions and it is the piece to which a pawn gets promoted.

The fourth char variable is 'bits'. This is a bitfield. It is used to store the type of move.

A chess move can be classified into many types depending upon the situation.

e.g. there can be moves that give check and there can be check evading moves.

In any program, it is always good to classify and sort information whenever it is possible to do so without investing more processing effort.

This is because later it can be used for other functions and they can check it by just a reference rather than processing out.

In TSCP, the author brilliantly stores the type of move during move generation process itself.

Thus making the move or unmaking the move takes less processing time and increases the speed of the engine.

E.g. while generating a castling move I'll update the move type as castle.

Now my make_move function will already know that the move is of type castle and hence 2 pieces will have to move - king + rook.

The move type data (called bits in TSCP) has following configuration.

It is mentioned in defs.h file

1. Capture - 1st bit set = 1

2. Castle - 2nd bit set = 2

3. En-Passant - 3rd bit set = 4

4. Pawn double move - 4th bit set = 8

5. Pawn move - 5th bit set = 16

6. Promotion - 6th bit set = 32

The advantage of this is seen in file board.c.

The functions makemove and takeback use this data for move making and unmaking.

Notice specifically in the makemove function how the author uses this to update the fifty move counter.

The fifty move counter is used to track information for the fifty move draw rule.

If there is no pawn move or a capture for 50 consecutive moves by both sides, the game is a draw.

Thus, there are 2 conditions to be checked - pawn move or capture.

The fifty move counter is reset to 0 (zero) if the move is either a pawn move or a capture.

So, we can have code which says:

if (move_type == PAWN_MOVE || move_type == CAPTURE)

fifty = 0;

By just one bit-wise ANDing, both flags are checked in TSCP:

if (m.bits & 17)

fifty = 0;

else

++fifty;