UneVEn

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Blue entities represent agents (predators) and orange ones represent prey.
# Preys Captured (Max=3): 3
# Of Miscoordinated Capture Attempts: 0
Return (Max=3): 3.0
Learns to minimize the number of miscoordination attempts and each agent waits for other agents to perfectly surround the prey, then take the capture action. Makes minimal mistakes to achieve higher reward.

# Preys Captured (Max=3): 0
# Of Miscoordinated Capture Attempts: 0
Return (Max=3): 0.0
Learns to minimize the number of miscoordination attempts by completely avoiding the prey as the learning is stuck in sub-optimal Nash Equilibrium due to relative overgeneralization.
This video is based on learned IQL policy with p=0.012, but other MARL methods such as QMIX, WQMIX, VDN, QTRAN, QPLEX, MAVEN also tend to learn a similar strategy and get stuck in sub-optimal solutions.

Please note that the speed of the videos can be adjusted for better viewing experience.

Page updated

Google Sites

Report abuse