Learning Homophilic Incentives in Sequential Social Dilemmas

Evolution of Cooperation in Cleanup

Exploring Incentives

Phase 1. Agents are learning basic dynamics of the environment. (Sec. 5.4)

Second-Order Social Dilemma

Phase 2. Some agents learn to give positive rewards to cleaning agents, while others enjoy the benefit. (Sec. 5.4)

Homophily Solves 2nd-SD

Phase 3. Agents who are close to the apple-spawning region simultaneously reward those cleaning agents and punish those who are next to the wastes but do not clean them. (Sec. 5.4)

Stabilized Cooperation

Phase 4. There are no second-order free-riders. (Sec. 5.4)

Gradient Field

Unexploitable Punishers

Introducing unexploitable altruistic incentives creates a "safe region" near P. (Sec. 3, Part B)

Exploitable Punishers

No "safe region" any more. (Sec. 3, Part C)

Homophily

The "safe region" reapears and is larger. (Sec. 3, Part D)

Other Replays

Cleanup (n=3)

Division of labor. (Sec. 5.2.)

ablation w/ inc.

Agents learn to give each other positive incentives excessively regardless of observations. (Sec. 5.3)