Learning to Incentive Other Learning Agents
The following are examples of learned behavior by LIO, actor-critic (AC) and inequity aversion (IA) agents in the 7x7 and 10x10 Cleanup maps.
7x7 Cleanup map
7x7 Cleanup map
LIO
LIO
LIO agents find a division of labor - the purple agent specializes to become a "river cleaner", while the blue agent becomes an "apple harvester".
Actor-Critic
Actor-Critic
Actor-critic (AC) agents sometimes perform cleaning, but both immediately compete for apples.
Inequity Aversion
Inequity Aversion
Inequity aversion (IA) agents cooperate to a certain extent, but do not exactly specialize to be a "cleaner" or "harvester".
10x10 Cleanup map
10x10 Cleanup map
LIO
LIO
Actor-Critic
Actor-Critic
Inequity Aversion
Inequity Aversion