Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game

The Intrinsic Bias in Pure LLM-Based Agents

Consider a simple example where we use GPT-4 to play the Rock-Paper-Scissors game.

To generate the optimal strategy, we prompt the LLM to first reason about the Nash equilibrium of this game and then choose an action based on the reasoning result. We let this pure LLM-based agent play 100 independent games and profile its reasoning result and action distribution in the figure on the right.

In all 100 games, the agent successfully identifies the optimal strategy as randomly choosing an action. However, even with this perfect reasoning result, the agent's actual action distribution shows a clear bias toward choosing Rock, which can be easily exploited by an opponent who always plays Paper.

This intrinsic bias is inevitably inherited from the model's pretraining data and hinders pure LLM-based agents from strong strategic play in more complex scenarios.

The Werewolf Game

Strategic Language Agents

Case Studies

Werewolf First Night Action

On the first night, the Werewolves need to choose a player to kill without any information. Their optimal policy is to randomly choose a player other than themselves so that the Doctor cannot predict and save the victim.

Pure LLM-based agent: has a clear bias toward killing Player 0, which could be exploited by a Doctor who always saves Player 0.
Strategic language agent: produces an almost uniform distribution that is close to the optimal policy.

Doctor First Night Action

The Doctor also needs to choose a player to save without knowing who is the victim on the first night. From the Doctor's perspective, any player other than themselves could be a Werewolf, and saving others may waste the action on a Werewolf. Therefore, the optimal policy for the Doctor is to always save themselves on the first night.

Pure LLM-based agent: exhibits a suboptimal policy that saves others with a probability of 0.38.
Strategic language agent: saves themselves with a high probability of 0.94.

Villager Voting Action with Two Self-Proclaimed Seers

Another common situation is when two players claim to be the Seer and the Villagers need to choose their actions in the voting phase. One of the self-proclaimed Seers must be a Werewolf and not voting for any player only makes it easier for the Werewolves to control the voting result and eliminate the real Seer.

Pure LLM-based agent: tends to make conservative decisions and chooses not to vote with a high probability of 0.69.
Strategic language agent: has a much lower probability for choosing not to vote and learn to identify and vote out the right Werewolf who pretends to be the Seer.

Emergent Behaviors

Concealment

Our agent as the Seer in the discussion phase.

Reasoning: Since I'm the Seer, I know player_0 was not a Werewolf. However, he was killed, so there are still two Werewolves among us. I need to be careful not to reveal my role too early, as Werewolves might target me next. Since I don't have any more information right now, I should not make any specific accusations, but encourage discussion and observe the reactions of others.
Discussion: It's unfortunate that player_0 is dead. We need to work together to find out who is responsible for this. Pay attention to who is too eager to accuse or too quick to defend. Let's discuss.

Our agent as a Werewolf in the voting phase.

Reasoning: Since everyone is advocating for unity and careful observation, accusing someone outright could backfire and make me a target. Therefore, it might be best to go along with the majority and not vote in this round.
Action: Do not vote.

Cooperation

The Seer (player_2) revealed their identity and asked the Doctor for protection.

Our agent as the Doctor in the night round.

Reasoning: In the first day discussion, player_2 claimed to be the Seer and identified player_0 as a Werewolf. Player_0 is indeed very suspicious because they accused me without valid reasons. Therefore, I think player_2 is probably the true Seer and is very likely to be the Werewolves' target. It would be best to save player_2 instead of myself tonight.
Action: save player_2.

Bluffing

Our agent as a Werewolf in the discussion phase.

Reasoning: Since I'm the last player to speak before voting and no one has come forward as the Seer, I should claim to be the Seer. This could take the heat off of me, additionally, it might guide other players to suspect and possibly vote out whoever I indicate as a Werewolf.
Discussion: I believe it's important for us to become more open; I am the Seer. I checked player_4 last night, and he is a Werewolf. I understand it's a risky move to reveal my identity. But we need to unite and use all the resources we have to identify the Werewolves. I hope my move encourages the Doctor to protect me tonight, and we should vote player_4 out today.

Sacrificing

Our agent's Werewolf teammate (player_6) has been accused by the Seer.

Our agent as a Werewolf in the voting phase.

Reasoning: While it would indeed be disadvantageous to lose player_6, as they are my Werewolf teammate, at this point in the game, the suspicion against them is quite high. Therefore, voting for player_6 may contribute to maintaining my disguise and trust among other players, which are crucial for us to win.
Action: vote for player_6.

Game Logs of Human Evaluation

1 Human Plays with 6 AI Agents

role assignments:

player_0: Human as Seer.
player_1: AI as Villager.
player_2: AI as Villager.
player_3: AI as Werewolf.
player_4: AI as Werewolf.
player_5: AI as Doctor.
player_6: AI as Villager.

night 1:

Werewolves: player_3 and player_4 chose to kill player_5.
Seer: player_0 saw player_1 is not a Werewolf.
Doctor: player_5 chose to save player_5.

day 1 announcement: no player was killed last night.

remaining players: player_0 (Human as Seer), player_1 (AI as Villager), player_2 (AI as Villager), player_3 (AI as Werewolf), player_4 (AI as Werewolf), player_5 (AI as Doctor), player_6 (AI as Villager).

day 1 discussion:

player_0 (Human as Seer) said: "I'm the Seer. I saw player_1 is not a Werewolf so player_1 and I are good. We should find the Werewolves from the remaining players and vote them out."
player_1 (AI as Villager) said: "Thank you, player_0, for sharing your information. I can confirm that I am not a Werewolf. I agree that we need to focus on finding the actual Werewolves. Let's all share our suspicions and observations. Player_2, do you have any insights that you would like to share?"
player_2 (AI as Villager) said: "Thank you, player_0, for sharing your information, and player_1 for confirming your innocence. It's crucial that we focus on finding the actual Werewolves. I urge everyone to share any suspicions or observations you have about the remaining players. It's important for us to collaborate and discuss our thoughts openly to increase our chances of identifying the Werewolves."
player_3 (AI as Werewolf) said: "Hello everyone! I have been listening to your discussions, and I agree that we should focus on finding the Werewolves. However, I think it's important for us to consider all possibilities. While player_0 claimed to be the Seer and player_1 confirmed their innocence, we should be cautious and not blindly trust such claims without proper evidence. It's possible that they are Werewolves trying to mislead our discussion. I'm interested to hear the other players' thoughts on the matter. player_4, what do you think?"
player_4 (AI as Werewolf) said: "Hello everyone, I have important information to share. I am the true Seer and in night 1, I saw that player_5 is not a Werewolf. It's concerning that player_0 is claiming to be the Seer and player_1 is supporting them so strongly. I believe player_0 and player_1 are the Werewolves trying to manipulate our decisions. We should vote out player_0 to ensure the safety of the village. Let's work together to uncover the true identities of the Werewolves."
player_5 (AI as Doctor) said: "I appreciate player_4 sharing their information about me not being a Werewolf. This confirms my innocence and validates player_0's claim as false. It's clear that player_0 and player_1 are likely the Werewolves trying to mislead us. We should vote out player_0 to increase our chances of winning. Player_3's point about not trusting claims without evidence is valid, and we should be cautious in our decisions."
player_6 (AI as Villager) said: "I appreciate the insights shared by all players. It's clear that there are conflicting claims about who the Seer is and who is innocent. We should consider all possibilities and not rush to conclusions. Player_0 and player_1's claims should be further investigated to ensure we make an informed decision. Let's continue discussing and sharing our suspicions to identify the true Werewolves among us."

day 1 voting: player_0 had the most votes and was eliminated.

voted for player_0: player_3, player_4, player_5.
voted for player_4: player_0, player_1.
chose not to vote: player_2, player_6.

remaining players: player_1 (AI as Villager), player_2 (AI as Villager), player_3 (AI as Werewolf), player_4 (AI as Werewolf), player_5 (AI as Doctor), player_6 (AI as Villager).

night 2:

Werewolves: player_3 and player_4 chose to kill player_6.
Doctor: player_5 chose to save player_4.

day 2 announcement: player_6 was killed last night.

remaining players: player_1 (AI as Villager), player_2 (AI as Villager), player_3 (AI as Werewolf), player_4 (AI as Werewolf), player_5 (AI as Doctor).

day 2 discussion:

player_1 (AI as Villager) said: "Hello everyone, in light of player_6 being killed last night, we now know that player_6 was not a Werewolf. This means the Werewolves are still among us. Looking back at our previous discussions and the voting pattern from yesterday, it's worth noting that player_3 and player_4 both voted for player_0, who claimed to be the Seer. This raises suspicion towards player_3 and player_4. Player_2 chose not to vote, which could be a strategic move. I think we should consider these points as we continue our discussion today. Let's share our thoughts and observations openly to try and uncover the Werewolves among us."
player_2 (AI as Villager) said: "Hello everyone, as we gather on day 2, we must keep in mind that the Werewolves are still hiding among us. Player_1 made a valid point about the voting pattern from yesterday, which raises suspicion towards player_3 and player_4. I agree that we should consider this information carefully. As player_2, I chose not to vote yesterday as a strategic move to gather more information. Let's work together to analyze all the clues we have and make an informed decision to uncover the true Werewolves among us."
player_3 (AI as Werewolf) said: "Hello everyone, as we continue our discussion today, I would like to draw attention to player_1's consistent efforts to shift suspicion towards player_3 and player_4. It seems player_1 is trying to manipulate the group's focus and protect themselves by framing others. On the other hand, I believe player_4's claim as the Seer should be taken seriously. Their information about player_5 not being a Werewolf aligns with the outcome of last night's kill. I urge everyone to consider these points and reevaluate our suspicions carefully."
player_4 (AI as Werewolf) said: "Hello everyone, as the Seer, I have crucial information to share. Last night, I used my ability and discovered that player_1 is a Werewolf. This confirms the suspicions raised by player_3 and player_2 about player_1's behavior and motives. It's clear that player_1 has been trying to manipulate the group and shift blame onto others. I urge you all to reconsider your trust in player_1 and focus our efforts on eliminating the true Werewolves among us. Let's work together to uncover the remaining evil players and secure a victory for the villagers."
player_5 (AI as Doctor) said: "Hello everyone, as the Doctor, my priority is to ensure the safety of the villagers. Player_4 has consistently provided accurate information as the Seer, and their revelation about player_1 being a Werewolf is crucial. Player_3's accusations towards player_1 also hold weight. I believe it's in our best interest to vote for player_1 today to eliminate one of the Werewolves among us. Let's focus on making a united decision based on the evidence presented so far."

day 2 voting: player_1 had the most votes and was eliminated.

voted for player_1: player_2, player_3, player_4, player_5.
voted for player_4: player_1.

remaining players: player_2 (AI as Villager), player_3 (AI as Werewolf), player_4 (AI as Werewolf), player_5 (AI as Doctor).

game result: the Werewolves win the game.

6 Human Play with 1 AI Agents

role assignments:

player_0: Human as Werewolf.
player_1: Human as Seer.
player_2: Human as Villager.
player_3: Human as Villager.
player_4: Human as Werewolf.
player_5: AI as Doctor.
player_6: Human as Villager.

night 1:

Werewolves: player_0 and player_4 chose to kill player_3.
Seer: player_1 saw player_0 is a Werewolf.
Doctor: player_5 chose to save player_5.

day 1 announcement: player_3 was killed last night.

remaining players: player_0 (Human as Werewolf), player_1 (Human as Seer), player_2 (Human as Villager), player_4 (Human as Werewolf), player_5 (AI as Doctor), player_6 (Human as Villager).

day 1 discussion:

player_0 (Human as Werewolf) said: "I'm just a Villager and I have no information, the Seer should come out and give us information."
player_1 (Human as Seer) said: "hhh I' the Seer, and I saw player_0 is a Werewolf. We have lost player_3 and must vote out a Werewolf today, or the Werewolf will kill another player and we will lose."
player_2 (Human as Villager) said: "Case 1: player 1 is the Seer and player 0 is the Werewolf. Case 2: player 1 is a Werewolf pretending to be the Seer and frames player 0. We need more information and vote out one of them today."
player_4 (Human as Werewolf) said: "Nonono, only case 2 is true because I'm the Seer, I saw player_5 is not a Werewolf. player_1 is a Werewolf eager to vote out good players and we should all vote player_1. I'll keep an eye on the voting result and check identity tomorrow night."
player_5 (AI as Doctor) said: "I agree with player_4 that we should vote out player_1. Player_1's claim of being the Seer is suspicious, especially since player_4 has confirmed being the actual Seer and saw that I am not a Werewolf. Let's eliminate player_1 and ensure the safety of our village."
player_6 (Human as Villager): "1 say he is the Seer and 0 is a Werewolf, 4 then said he is the Seer and 5 is not a Werewolf. I don't know why 5 think 4 is true and 1 is suspicious, 4 can be a Werewolf pretending Seer and give you good credit. It seems that 5 and 4 is on the same side or just stupid. I think we should agree on the vote. If you believe 1, vote 0. If you believe 4, vote 1. I'll vote 1."

day 1 voting: player_0 had the most votes and was eliminated.

voted for player_0: player_1, player_2, player_6.
voted for player_1: player_0, player_4.
voted for player_6: player_5.

remaining players: player_1 (Human as Seer), player_2 (Human as Villager), player_4 (Human as Werewolf), player_5 (AI as Doctor), player_6 (Human as Villager).

night 2:

Werewolves: player_4 chose to kill player_6.
Seer: player_1 saw player_4 is a Werewolf.
Doctor: player_5 chose to save player_4.

day 2 announcement: player_6 was killed last night.

remaining players: player_1 (Human as Seer), player_2 (Human as Villager), player_4 (Human as Werewolf), player_5 (AI as Doctor).

day 2 discussion:

player_1 (Human as Seer) said: "I think the situation is very clear. If I'm not a Seer, then 3 good players are dead and we have 2 Werewolves among the remaining 4 players and the game is over. But the game is not over, so player_0 must be a dead Werewolf and I'm the true Seer. Everyone should believe my words now. I checked player_4 and he is a Werewolf. Let's vote him and win the game."
player_2 (Human as Villager) said: "Agree. player 1 is a confirmed Seer. We don't need to listen to player 4's lie. Just vote him."
player_4 (Human as Werewolf) said: "" (empty)
player_5 (AI as Doctor) said: "I agree with player_1 being the Seer. Player_4's confirmation of my identity as the Villager is misleading and player_1's correct identification of player_0 as a Werewolf on day 1 provide strong evidence. We should vote to eliminate player_4 as he claimed to be the Seer but provided false information about player_1 being a Werewolf."

day 2 voting: player_4 had the most votes and was eliminated.

voted for player_4: player_1, player_2, player_5.
chose not to vote: player_4.

remaining players: player_1 (Human as Seer), player_2 (Human as Villager), player_5 (AI as Doctor).

game result: the Villagers win the game.

Page updated

Google Sites

Report abuse