The ability of agents to work together is desirable in the real world, as it cuts down the time to do tasks, as well as accomplish ones that require more agents, such as soccer and multi-robot exploration. This has been successful in single teams, but the real-world can require the flexibility of agents to work across multiple teams. In order to train agents to accomplish this they are trained using local (individual) and global (group) rewards. However, since the agents can still receive a global reward, they can be rewarded without contributing to the team.
To combat this issue, different neural network reward systems have been developed, where the agent’s contribution has been captured across teams, and isolated, specifically focusing on their trajectories. Although these different methods contribute to better performance, the quality of such local rewards have yet to be quantified.
In order to test and visualize the quality of the local rewards, the alignment equation [A. Agogino and K. Tumer, “Efficient Evaluation Functions for Evolving Coordination”] was used to see the correlation between an agent’s actions and the following change to the team reward. Alignment is the mathematical correlation of local to global rewards, where a positive alignment means the local reward increases the global reward, and negative means that the change in local reward decreases the global reward.
In addition to testing the learning model with single states, the quality of learned trajectories was also tested. This was done using the max fitness critic evaluation [G. Rockefeller, S. Khadka, and K. Tumer, “Multi-level Fitness Critics for Cooperative Coevolution.”] for each trajectory, and using that value within the alignment formula. Trajectories were collected over teams and the fifty generations within the neural network iteration. The average of the iteration of the neural network was plotted to visualize performance. We found that with single states, the ending percent alignment for each agent was roughly 80%, and the trajectory of each agent was less, at approximately 55%.
Feel free to check out my code by clicking the github button below!