Prisoner's Dilemma

Prisoners' Dilemma is a famous problem in Games Theory, in which two prisoners A and B, in adjacent cells but unable to communicate, are each invited to inform on the other, in which case they will be freed but the other's sentence increased. The sentencing options are as follows: 

Pursuing freedom may lead both to betray and thus serve two years, when by staying silent they could both have had only one, but they only have one chance, and no opportunity for retraction or revenge. However the game becomes more interesting when interated, that is, when prisoners have many similar encounters, in which case they can retaliate for previous betrayals and learn an optimum strategy from the experiences. It appears that the best long-term strategy is 'tit-for-tat', where each prisoner punishes the other for a betrayal only once, then returning to co-operation. You can test this by running the following simulation written in Python, choosing prisoners with various different strategies and numbers of encounters:


############ Iterated Prisoner's Dilemma Simulation ########################

from math import *

from random import randrange, seed, shuffle


## Types of prisoner and strategy

class Prisoner:                # base class

    def __init__(self, name):

        self.name = name

        self.last = 0

        self.score = 0

    def inc(self, n):

        self.score += n

    

class Coop(Prisoner):          # always co-operate, ie. remain silent

    def say(self, opp):

        self.last = 0

        return self.last


class Betray(Prisoner):        # always betray

    def say(self, opp):

        self.last = 1

        return self.last        


class Altern(Prisoner):        # alternately co-operate and betray

    def say(self, opp):

        self.last = not(self.last)

        return self.last


class TitTat(Prisoner):        # tit-for-tat, betray once for each betrayal

    def say(self, opp):

        self.last = opp.last

        return self.last


class Oppo(Prisoner):          # do the opposite of opponent's last action

    def say(self, opp):

        self.last = not(opp.last)

        return self.last


class Rand(Prisoner):          # betray at random on average once in n times

    def __init__(self, name, n):

        self.n = n

        Prisoner.__init__(self, name)

    def say(self, opp):

        self.last = (randrange(0,self.n) == 0)

        return self.last


## Perform n confrontations between prisoners a and b

def go(a,b,n):

    seed()

    ascore = bscore = 0

    for i in range(0,n):

        x = a.say(b), b.say(a)

        if x == (1,1):                   # both co-operate, 1 year each 

            a.inc(1); b.inc(1)

        elif x == (0,1):                 # b betrays,is freed, a gets 3 years

            a.inc(3); b.inc(0)

        elif x == (1,0):                 # a betrays is freed, b gets 3 years

            a.inc(0); b.inc(3)

        elif x == (1,1):                 # both betray, 2 years each                 

            a.inc(2); b.inc(2)


## Create some perpetrators  

ni1  = Coop('Nice1' );    ni2 = Coop('Nice2' ) 

na1  = Betray('Nasty1');  na2 = Betray('Nasty2');

al1  = Altern('Alter1');  al2  = Altern('Alter2');

r3   = Rand('Random3',3); r16 = Rand('Random16', 16); 

pp1  = Oppo('OppLast1');  pp2 = Oppo('OppLast2');

tit1 = TitTat('TitTat1'); tit2 = TitTat('TitTat2');


##  Perform iter confrontations and record the penalties

def test(iter):

    perps = [tit1, ni1, ni2, al1, r3, tit2, na1]     # list of perps

    seed()

    totalpen = 0

    

    for i in range(0, iter):            # perform the confrontations

        shuffle(perps)           

        go(perps[0], perps[1], 6) 

        

    results = {}                        # dictionary of perp names and scores

    for i in perps: 

        results[i.score] = i.name       

        totalpen += i.score

    

    k = results.keys()                  # print in ascending order of penalties                                              

    

    print('\n')

    for i in sorted(k): 

        print('%8s' % results[i], '\t', '%0i' % i)

        

    print('\n\nTotal Penalties Suffered: ',totalpen)

        

    

test(1000)