Phase 1. Agents are learning basic dynamics of the environment. (Sec. 5.4)
Phase 2. Some agents learn to give positive rewards to cleaning agents, while others enjoy the benefit. (Sec. 5.4)
Phase 3. Agents who are close to the apple-spawning region simultaneously reward those cleaning agents and punish those who are next to the wastes but do not clean them. (Sec. 5.4)
Phase 4. There are no second-order free-riders. (Sec. 5.4)
Introducing unexploitable altruistic incentives creates a "safe region" near P. (Sec. 3, Part B)
No "safe region" any more. (Sec. 3, Part C)
The "safe region" reapears and is larger. (Sec. 3, Part D)
Division of labor. (Sec. 5.2.)
Agents learn to give each other positive incentives excessively regardless of observations. (Sec. 5.3)