Prisoner's Dilemma

Prisoners' Dilemma is a famous problem in Games Theory, in which two prisoners A and B, in adjacent cells but unable to communicate, are each invited to inform on the other, in which case they will be freed but the other's sentence increased. The sentencing options are as follows:

If A and B each betray the other, each will serve two years in prison
If A betrays B but B remains silent, A will be set free and B will serve three years, and vice versa
If A and B both remain silent, they will both only serve one year on a lesser charge

Pursuing freedom may lead both to betray and thus serve two years, when by staying silent they could both have had only one, but they only have one chance, and no opportunity for retraction or revenge. However the game becomes more interesting when interated, that is, when prisoners have many similar encounters, in which case they can retaliate for previous betrayals and learn an optimum strategy from the experiences. It appears that the best long-term strategy is 'tit-for-tat', where each prisoner punishes the other for a betrayal only once, then returning to co-operation. You can test this by running the following simulation written in Python, choosing prisoners with various different strategies and numbers of encounters:

############ Iterated Prisoner's Dilemma Simulation ########################

from math import *

from random import randrange, seed, shuffle

## Types of prisoner and strategy

class Prisoner: # base class

def __init__(self, name):

self.name = name

self.last = 0

self.score = 0

def inc(self, n):

self.score += n

class Coop(Prisoner): # always co-operate, ie. remain silent

def say(self, opp):

self.last = 0

return self.last

class Betray(Prisoner): # always betray

def say(self, opp):

self.last = 1

return self.last

class Altern(Prisoner): # alternately co-operate and betray

def say(self, opp):

self.last = not(self.last)

return self.last

class TitTat(Prisoner): # tit-for-tat, betray once for each betrayal

def say(self, opp):

self.last = opp.last

return self.last

class Oppo(Prisoner): # do the opposite of opponent's last action

def say(self, opp):

self.last = not(opp.last)

return self.last

class Rand(Prisoner): # betray at random on average once in n times

def __init__(self, name, n):

self.n = n

Prisoner.__init__(self, name)

def say(self, opp):

self.last = (randrange(0,self.n) == 0)

return self.last

## Perform n confrontations between prisoners a and b

def go(a,b,n):

seed()

ascore = bscore = 0

for i in range(0,n):

x = a.say(b), b.say(a)

if x == (1,1): # both co-operate, 1 year each

a.inc(1); b.inc(1)

elif x == (0,1): # b betrays,is freed, a gets 3 years

a.inc(3); b.inc(0)

elif x == (1,0): # a betrays is freed, b gets 3 years

a.inc(0); b.inc(3)

elif x == (1,1): # both betray, 2 years each

a.inc(2); b.inc(2)

## Create some perpetrators

ni1 = Coop('Nice1' ); ni2 = Coop('Nice2' )

na1 = Betray('Nasty1'); na2 = Betray('Nasty2');

al1 = Altern('Alter1'); al2 = Altern('Alter2');

r3 = Rand('Random3',3); r16 = Rand('Random16', 16);

pp1 = Oppo('OppLast1'); pp2 = Oppo('OppLast2');

tit1 = TitTat('TitTat1'); tit2 = TitTat('TitTat2');

## Perform iter confrontations and record the penalties

def test(iter):

perps = [tit1, ni1, ni2, al1, r3, tit2, na1] # list of perps

seed()

totalpen = 0

for i in range(0, iter): # perform the confrontations

shuffle(perps)

go(perps[0], perps[1], 6)

results = {} # dictionary of perp names and scores

for i in perps:

results[i.score] = i.name

totalpen += i.score

k = results.keys() # print in ascending order of penalties

print('\n')

for i in sorted(k):

print('%8s' % results[i], '\t', '%0i' % i)

print('\n\nTotal Penalties Suffered: ',totalpen)

test(1000)