Learning Homophilic Incentives in Sequential Social Dilemmas
Evolution of Cooperation in Cleanup
Exploring Incentives
Phase 1. Agents are learning basic dynamics of the environment. (Sec. 5.4)
Second-Order Social Dilemma
Phase 2. Some agents learn to give positive rewards to cleaning agents, while others enjoy the benefit. (Sec. 5.4)
Homophily Solves 2nd-SD
Phase 3. Agents who are close to the apple-spawning region simultaneously reward those cleaning agents and punish those who are next to the wastes but do not clean them. (Sec. 5.4)
Stabilized Cooperation
Phase 4. There are no second-order free-riders. (Sec. 5.4)
Gradient Field
Unexploitable Punishers
Introducing unexploitable altruistic incentives creates a "safe region" near P. (Sec. 3, Part B)
Exploitable Punishers
No "safe region" any more. (Sec. 3, Part C)
Homophily
The "safe region" reapears and is larger. (Sec. 3, Part D)
Other Replays
Cleanup (n=3)
Division of labor. (Sec. 5.2.)
ablation w/ inc.
Agents learn to give each other positive incentives excessively regardless of observations. (Sec. 5.3)