The Principal Variation ( pv for short ) is a series of best moves for both players in a search of depth > 1.
E.g. for the starting position, it maybe evaluated at depth = 4 that 1. e4 e5 2. Nf3 Nc6 is the best line for both players.
This entire line is the principal variation.
It is always in the order of occurrence. If the first move is made, then the second move can be made in the new position.
Note that the order is sequential with respect to depth - i.e. e4 is the best move at the root. Then e5 is the best move at next depth and so on.
Collecting the Principal Variation requires a little deeper understanding of DFS algorithm.
The concept of collecting the PV is simple:
If best move is found at father then append the son's principal variation to the best move of father. This is the principal variation at father.
Thus, pv at depth d = best_move at d + pv at d -1.
pv at depth 0 is never updated because at depth = 0, eval is returned.
pv at depth = 1 is best move at depth = 1.
pv at depth = 2 is best move at depth = 2 + pv at depth = 1.
pv at depth = 3 is best move at depth = 3 + pv at depth = 2 and so on...
This is because - information about the bestness of a move propagates upwards.
The father is good only because his son's are good for him (even though father and son both have exactly opposite intentions).
Check the above diagram as an example.
The grandfather is the root position from where search is called with depth = 2.
Therefore at grandfather, depth = 2
He has 2 sons - father1 and father2.
Whenever, the position at father1 or father2 is called, depth = 1.
In a DFS father1 and father2 will not be compared directly till depth = 0 is reached.
Hence, first father1 is picked. To evaluate him completely, his sons are picked.
When the positions at sons are reached, then depth = 0.
There is no further move generation and eval is returned.
son1 is the best move at depth = 1.
The value of father1 = value of son1 (2) at depth = 1.
father1 returns this value to grandfather.
Now grandfather finds out that at depth = 2, father1 is the best as father2 is not yet evaluated (He compares father1 with -infinity)
He updates his score to that of father1 (2).
Also, he records that best move at depth = 2 is father1.
Hence pv at depth = 2 is best move at depth = 2 + pv at depth = 1
pv at depth = 2 is father1,son1.
Then father2 is evaluated.
son2 is the best move for father2.
He updates his score = -4.
He updates that the best move at depth = 1 is son2.
He returns this information to grandfather at depth = 2.
Grandfather finds out that he is not better than father1 and rejects him (does nothing or he will go for father3).
If son2 would have score > 2 (say 5) and then father2 would have score = 5.
Grandfather would find that 5 > 2.
He would update his score = 5.
He would update that best move at depth = 2 is father2.
pv at depth = 2 is best move at depth = 2 + pv at depth = 1.
But now, pv at depth = 1 would have been son2.
Hence, pv at depth = 2 would be father2,son2
Also notice that finally, the score of grandfather will be equal to score of son1 or son2 whosoever is best.
Implementation in TSCP:
In TSCP, there are 2 data sets used to collect pv (refer data.c and search.c).
move pv[MAX_PLY][MAX_PLY];
int pv_length[MAX_PLY];
Also pv is collected not using depth variable but using ply variable.
ply will inversely vary with depth i.e. depth will decrease by 1 and ply will increase by 1.
At root position (grandfather), depth = 2 but ply = 0.
At the sons, depth = 0 and ply = 2.
The best move at a particular ply will always be stored as the diagonal element of the pv[ ][ ] array.
pv[ply][ply] = gen_dat[i].m;
E.g. here, first, whenever min or max would be called, pv_length[ply] = ply will be updated.
Only then it will be checked, whether depth == 0 and eval() will be returned.
Initially when father1 is evaluated fully,
pv_length[2] = 2,
pv_length[1] = 1
and pv[1][1] = son1
ply = 1
The code (search.c) is
for (j = ply + 1; j < pv_length[ply + 1]; ++j)
pv[ply][j] = pv[ply + 1][j];
pv_length[ply] = pv_length[ply + 1];
The for loop will do nothing and pv_length[1] = pv_length[2] = 2.
Then at grandfather,
ply = 0
pv_length[0] = 0
pv_length[1] = 2 (updated at father1)
pv[0][0] = father1
The for loop will update
pv[0][1] = pv[1][1].
And finally pv_length[0] will be equal to pv_length[1] which will be 2.
Same thing will be done for deeper searches.
Basically the upper triangular matrix will be populated.
The diagonal elements will be the best moves at each ply and their pv's will be appended to the lesser plies upwards.
Notice that the author has a separate variable called pv_length even though we already know that the pv length will always be equal to the depth called in search.
But later when we understand "search extensions", we can realise that it may not always be the case.
We can modify TSCP's pv collection method in our own 3 modified functions like this:
int max(int depth)
{
pv_length[ply] = ply;
if(depth == 0)
return eval();
int score = -100000;
int i , value;
gen();
for (i = first_move[ply]; i < first_move[ply + 1]; ++i) {
if (!makemove(gen_dat[i].m.b))
continue;
value = min (depth - 1);
takeback();
if(value > score){
score = value;
pv[ply][ply] = gen_dat[i].m;
for ( int j = ply + 1; j < pv_length[ply + 1]; ++j)
pv[ply][j] = pv[ply + 1][j];
pv_length[ply] = pv_length[ply + 1];
}
}
return score;
}
int min(int depth)
{
pv_length[ply] = ply;
if(depth == 0)
return -eval();
int score = 100000;
int i , value;
gen();
for (i = first_move[ply]; i < first_move[ply + 1]; ++i) {
if (!makemove(gen_dat[i].m.b))
continue;
value = max (depth - 1);
takeback();
if(value < score){
score = value;
pv[ply][ply] = gen_dat[i].m;
for ( int j = ply + 1; j < pv_length[ply + 1]; ++j)
pv[ply][j] = pv[ply + 1][j];
pv_length[ply] = pv_length[ply + 1];
}
}
return score;
}
And the main search function would be:
void search(){
int score;
int depth = 3;
if (side == LIGHT)
score = max(depth);
else
score = min(depth);
printf("\nScore is : %d\n",score);
for (int j = 0; j < pv_length[0]; ++j)
printf(" %s", move_str(pv[0][j].b));
}