The Ghost-Ratings

How exactly does it work?

This detailed explanation of the Ghost-rating system was published in Diplomacy World 105. The following version is edited down to the area's relevant to webDiplomacy:

In his article on Internet Diplomacy in Diplomacy World 103, Jason Koelewyn made the insight that “…most of us are geeks of one flavour or another, and geeks love numbers and rankings.” Just the shortest glance at statistics will show that this has held true at phpdiplomacy.net, the website where I play my Diplomacy. It is the host of 380 active games at the time of writing, and boasts over 5000 completed games. Like diplomaticorp.com, this site has surpassed the 100 player ‘barrier’ for active members, as well as over 13000 registered members.

In August of 2007 the points system was introduced by the developer of phpdiplomacy, Kestas Kuliukas, after the number of players had dropped due to Civil Disorder ravaging the community over the past six months. He will be able to give a far better account than me, but suffice it to say that a sudden boom in players at the beginning of 2007 had swamped a small community, changing it from one where you knew every player to one where you knew few. The site was in trouble because the social responsibility that was once there was lost. Ever since this introduction of the points system the growth of phpdiplomacy has been dramatic. In just one year the number of unique ‘hits’ had increased by over seven fold. Clearly then ranking players is of the utmost importance for successful Internet-Diplomacy. The reason for this is simple- there was a number, a badge that said that you were a good player or a bad player. If you went into CD or joined too many games, you never got more points. So you just didn’t go into CD or join too many games if you actually wanted to play. There is one thing that points don’t do however, and that is tell you with any accuracy how good the various players are, and it would be much better to have an accurate rating system for Diplomacy. I recognised this at once, and actually left the site at about the time the points were introduced, for a short interval. It is to this end that I have developed Ghost-rating, a system designed for Internet Diplomacy rating, rather than tournament scoring, in that it is meant to rate a large group of people.

There were two major aims for this system:

1. To promote desirable behaviour

2. To be an accurate rating system.

Sadly, these two may very well be antagonistic, although, what is really wanted is for people to play at the best of their ability, so not playing so few games that they don’t get a feel for how to play Diplomacy, not so many that they cannot concentrate properly on each game, and to never enter CD. The traits of a good player are the traits that we wish to encourage, so if we rate players properly, in theory it should all fall into place.

The single inspiration for my system comes from the work of Prof. Arpad Elo, who developed the Elo-rating system since adopted by FIDE’s (the Fédération Internationale des Échecs or World Chess Federation). His work underpinned mine, with the formula:

New Rating = Old Rating + V * (Result - Expected Result)

Here, the result is some method of scoring the game, so that the sum of all players results always equals one (It must always equal the same, otherwise it stops being a zero sum game, which is silly. Equalling one is a nice convention). Expected result is defined as a function of the seven players’ starting ratings, and what that function is depends on the way the result is defined. Clearly this too has to sum to 1 (you cannot expect anything else). V defines how quickly the ratings change. It is desirable to have V such that a player’s rating changes about the same amount no matter who they play.

This formula makes the rating system zero-sum, so ratings are the same over time, unless the average standard increases/decreases. It is always hard to compare over time, but this system give us our best shot at that. Each player starts on the average rating which is 100 (Chosen because firstly, it seems natural to start on a power of ten, and secondly, 10 is too low to avoid using decimal places, 1,000 is plausible but 10,000 is too high for ratings to be memorable.)

This formula is all well and good, but we obviously need to define V, Result and Expected Result. I have done this for two different rating systems. The first, and simplest, is Winner Takes All. Basically, winning gives you a score of 1, anything else gives you zero, except for n-way draws that give you 1/n if you are part of the draw, otherwise, zero.

Now, for this, we can define ratings to follow a certain rule with expected result, or rather to take it as an axiom. I used the idea that ratings could be a win ratio. So if player A has a rating of 120, and player B has a rating of 60, in a game with both of them playing, player A is twice as likely to win as player B. That gives us the following formula:

Now we need to work out how to get V. Clearly it has to be some function of the ratings of the players involved, so let:

Consider the new rating of player 1, given that his real rating should be r, and assuming that all other players are accurately rated, with the sum of their ratings=k. Result, on average, should be given by

by virtue of the expected result formula. Then, on average:

And so it works to have

but only on average. If we were to actually do that, one defeat would be taken to be precisely your average skill, and your rating would plummet, one win would see your rating skyrocket, so we have to divide V by some constant to keep ratings from boomeranging around. If we set the average (and starting) rating at 100,

gives a variance of 40, which seems about right from my models, although discretion can be used.

Hence for Winner Takes All systems, you just combine the formulae above, to get the ratings

The second scoring system I shall look at is “Points Per Supply Centre”. Basically, result= SCs owned/34. This is rather more complicated in terms of expected result, because you clearly can only win 18 centres maximum. The reasons for imposing the maximum are two-fold. Firstly, it is not desirable for players to draw out a game in an attempt to try to gain extra centres, and secondly, it would be impossible to quantify how likely a player is to get 19 centres rather than 18, for instance. (It should be noted that using this scoring system does mean that every game must be played to the end, with no concessions, although this isn’t an article about different scoring systems)

Because of the complication this maximum creates, it is necessary to look at the outcome as having two possibilities. The first is winning, and getting 18 SCs, the second is not winning and getting 16 or fewer SCs. You then need to look at both of these, and calculate the Expected Result that way. In essence:

So all we need to find is the Expected success in non-victory. Herein lies a problem- that depends on who the victor is. If player 1, clearly there cannot be any success in non-victory for player 1, and if player 2 is victor, the chances of success are different than if player 3 is a victor, because the people you are competing against are different. In fact, if player j is victor, with j not 1, the success for player 1 in non-victory is given by:

The chance of that actually ever happening is the same as player j’s victory chance, so we must multiply by that. Summing this for all possible j winning (other than j=1, where there is no chance of success in non-victory because you have won) gives us the expected success in non-victory, Dr₁:

Now, for V, it doesn’t make too great a difference if you use the same formula as for winner takes all, although clearly it would be possible to find one that works in the same way as the winner takes all one does for winner takes all, but ultimately the exercise is probably pointless, due to the approximation at the end, and the inherent problems with rating a game such as diplomacy. That and the fact that the formula would no doubt be hideous has meant that I have not created a V formula specific to PPSC.

So you can again take all these formulae and make the calculations necessary.

Google Sites

Report abuse