Prisoner's Dilemma
Prisoners' Dilemma is a famous problem in Games Theory, in which two prisoners A and B, in adjacent cells but unable to communicate, are each invited to inform on the other, in which case they will be freed but the other's sentence increased. The sentencing options are as follows:
If A and B each betray the other, each will serve two years in prison
If A betrays B but B remains silent, A will be set free and B will serve three years, and vice versa
If A and B both remain silent, they will both only serve one year on a lesser charge
Pursuing freedom may lead both to betray and thus serve two years, when by staying silent they could both have had only one, but they only have one chance, and no opportunity for retraction or revenge. However the game becomes more interesting when interated, that is, when prisoners have many similar encounters, in which case they can retaliate for previous betrayals and learn an optimum strategy from the experiences. It appears that the best long-term strategy is 'tit-for-tat', where each prisoner punishes the other for a betrayal only once, then returning to co-operation. You can test this by running the following simulation written in Python, choosing prisoners with various different strategies and numbers of encounters:
############ Iterated Prisoner's Dilemma Simulation ########################
from math import *
from random import randrange, seed, shuffle
## Types of prisoner and strategy
class Prisoner: # base class
def __init__(self, name):
self.name = name
self.last = 0
self.score = 0
def inc(self, n):
self.score += n
class Coop(Prisoner): # always co-operate, ie. remain silent
def say(self, opp):
self.last = 0
return self.last
class Betray(Prisoner): # always betray
def say(self, opp):
self.last = 1
return self.last
class Altern(Prisoner): # alternately co-operate and betray
def say(self, opp):
self.last = not(self.last)
return self.last
class TitTat(Prisoner): # tit-for-tat, betray once for each betrayal
def say(self, opp):
self.last = opp.last
return self.last
class Oppo(Prisoner): # do the opposite of opponent's last action
def say(self, opp):
self.last = not(opp.last)
return self.last
class Rand(Prisoner): # betray at random on average once in n times
def __init__(self, name, n):
self.n = n
Prisoner.__init__(self, name)
def say(self, opp):
self.last = (randrange(0,self.n) == 0)
return self.last
## Perform n confrontations between prisoners a and b
def go(a,b,n):
seed()
ascore = bscore = 0
for i in range(0,n):
x = a.say(b), b.say(a)
if x == (1,1): # both co-operate, 1 year each
a.inc(1); b.inc(1)
elif x == (0,1): # b betrays,is freed, a gets 3 years
a.inc(3); b.inc(0)
elif x == (1,0): # a betrays is freed, b gets 3 years
a.inc(0); b.inc(3)
elif x == (1,1): # both betray, 2 years each
a.inc(2); b.inc(2)
## Create some perpetrators
ni1 = Coop('Nice1' ); ni2 = Coop('Nice2' )
na1 = Betray('Nasty1'); na2 = Betray('Nasty2');
al1 = Altern('Alter1'); al2 = Altern('Alter2');
r3 = Rand('Random3',3); r16 = Rand('Random16', 16);
pp1 = Oppo('OppLast1'); pp2 = Oppo('OppLast2');
tit1 = TitTat('TitTat1'); tit2 = TitTat('TitTat2');
## Perform iter confrontations and record the penalties
def test(iter):
perps = [tit1, ni1, ni2, al1, r3, tit2, na1] # list of perps
seed()
totalpen = 0
for i in range(0, iter): # perform the confrontations
shuffle(perps)
go(perps[0], perps[1], 6)
results = {} # dictionary of perp names and scores
for i in perps:
results[i.score] = i.name
totalpen += i.score
k = results.keys() # print in ascending order of penalties
print('\n')
for i in sorted(k):
print('%8s' % results[i], '\t', '%0i' % i)
print('\n\nTotal Penalties Suffered: ',totalpen)
test(1000)