Ooooooh shamelessly clickbaited. Ok now read.
Suppose you and a stranger in separate rooms. In front of you are two buttons, one labeled with “Split”, the other with “Steal”. You can’t see or talk to the stranger in any way. You are told that in each round you will both have to choose one option and you will be paid accordingly:
If you both split, you each get $3.
If one steals and the other splits, the person who steals gets $5 and the other person gets none.
If you both steal, you each get $1.
There will be 100 rounds. What should you do? In order to answer this question, I did what I do best: overcomplicate, vibe code a simulation, and overanalyze everything.
For context, this game has already been analyzed by multiple professors in the past. One of the best strategies out there is called “tit for tat”. It’s a very simple strategy: copy what the other player did in the last turn. If they stole, steal back. If they cooperated, then split with them. What makes this strategy good is that it will get a lot of points with any nice strategy (a strategy that isn’t the first to steal) and won’t lose many points to a malicious strategy (a strategy that tries to be the first to steal). I wanted to see if my simulation would reach this type of player or potentially an even better strategy.
The simulation itself uses 50 AI players that each have a neural network. They receive inputs of:
What happened in the past 5 rounds
Round number
Their score and their opponents score
This data is passed through a few layers (which compute an output using a neural network), after which each AI chooses either split or steal. To train the AI players, they are put in a round robin tournament and the players with the highest average score will make more copies of themselves, while the low performers will be eliminated. This sort of tournament is repeated 200 times to give the AIs time to improve.
At first, I tried starting all the players off with a 50/50 split steal chance. The cooperation rate rose at the start but then suddenly fell and never climbed back up.
Likely, what happened was that the AIs first learned that splitting was beneficial, so more and more of them began to split. However, at a certain point, a few AIs learned that they could take advantage of this cooperation by always stealing. That strategy spread around, and after it became prevalent enough, any player trying to cooperate would be immediately eliminated. We can see evidence of this theory in another graph. In this graph, the points are clustered in the top left, meaning that low cooperation is directly proportional to high fitness (which means the AI makes more copies of itself). You can also notice the sole point in the bottom right which is higher cooperation but way lower fitness.
At the end of this simulation the best player the code came up with was simply a bot that always stole. However, we definitely have NOT found the optimal strategy. The average score significantly fell when cooperation declined, meaning that the players got worse. This is because when bad players play each other, they both get terrible scores.
Because this didn’t work, I decided to start the population with 40% players who would always cooperate and make the remaining 60% start random. This would average out to be a 70% cooperation rate. Surprisingly, this time, the splitters managed to overwhelm the stealers.
My guess for why this happened is that the split players learned to retaliate when they were stolen from. Now, when malicious players play them, they get barely any points, but when they play another good player, they get a full 300 points. This time, the graph of fitness against cooperation rate goes in the other direction: more cooperation = more fitness. This encourages more cooperation, causing a positive feedback loop.
This time, the personality of the final AI was very much like Tit for Tat. If you defect, it will defect back. It seems that Tit for Tat is optimal. Also, looking at the average score, it acted very differently this time. It initially fell but quickly recovered, climbing back up close to 300 points, showing how the new strategy was very close to optimal.
Of course, this leaves me with many questions: can I do better and what if I let the simulation go on for longer? I did end up running simulations to attempt to answer these questions, but there’s too much to fit in this article. If anyone wants to see that data, there is a link at the end of the article.
This game actually does teach us a few lessons about real life. Although this exact circumstance will likely never happen to you, there are many moments where you have to choose between cooperating and defecting. When Professor Axelrod analyzed this game he came up with four main requirements for a good strategy:
Nice: the player will not defect unprovoked
Retaliating: when attacked, the player will (at least sometimes) attack back
Forgiving: will stop defecting after a while
Non-envious: doesn’t aim to score more than the opponent
Although these requirements were meant for the game, we can easily apply them to real life. Be nice, don’t let others control you, forgive them, and don’t envy others. That’s why you should choose kindness.
EXTRA DATA (Warning: there is a LOT)