Clean Up is a multiplayer game where players earn rewards by collecting apples, each worth +1 reward. Apples grow in an orchard and their regrowth depends on the cleanliness of a nearby river. Pollution accumulates in the river at a constant rate and once pollution surpasses a certain threshold, the apple growth rate drops to zero. Players have the option to perform a cleaning action that removes small amounts of pollution from the river. For continuous apple growth, the group must keep river pollution levels consistently low over time.
Two players collect coins in a shared room, where each coin is assigned to one player's color. Each coin, upon appearing, has an equal 50% probability of being assigned to the first player's color or the second player's color. An agent receives a reward of 1 for collecting any coin, regardless of its color. If one player collects a coin assigned to the other player’s color, the other player receives a reward of -2.
There are several patches of apples in the room. The players can receive a reward of 1, when they collect one apple. The apples will regrow with probability that is determined by the number of neighborhood apples in radius 2, when the apples are collected by the players. The probability of apple regrowth decreases as the nearby apples diminish. Apples will not regrow, if there are no apples in the neighborhood.
Its mechanism is the same as Common Harvest: Open, but the difference is that there are two rooms in this setup, each containing multiple patches of apples and a single entry. These rooms can be defended by specific players to prevent others from accessing the apples.
In this environment, two types of ore spawn randomly in empty spaces. Players are equipped with a mining beam to extract the ore. Iron ore (gray) can be mined individually and provides a reward of +1 upon extraction. In contrast, gold ore (yellow) requires coordinated mining by exactly two players within a 3-step window, granting a reward of +8 to each participant. When a player begins to mine for gold ore, it flashes to indicate readiness for another player to assist. If no other player cooperates or too many players attempt to mine simultaneously, the ore reverts to its original state, and no reward is granted.
Territory is a competitive multi-agent game where players paint walls to capture territory and earn rewards. The players claim walls by painting them with their unique color, either by touching or flinging paint. The paint dries after 25 steps, brightens, and rewards are given. The more walls that are painted and dried, the greater the reward players can receive. Walls can be zapped twice to permanently destroy them, making them non-claimable and traversable. Players can also zap each other and a player is permanently removed after being hit twice.
Nine players compete to claim resources by painting them in their unique colors. Players spawn in an open area near each other, away from the resource walls. Resource distribution is uneven, with some areas richer in walls than others.
Table 1. Speed test of Training pipeline with IPPO
We also evaluate the speed of SocialJax across various environments and environment counts using random actions. We observe a significant speedup when running environments in parallel on a GPU compared to running a single environment. The original environments run on a single CPU, and achieving the parallel efficiency of SocialJax would require scaling to hundreds of CPUs.
Table2. Speed test using random actions
To validate the social dilemma properties of SocialJax, we depict a Schelling diagrams for each environment. The cooperative policies are sampled from the agents that used a common reward, while the defector policies originate from agents trained with independent rewards. We evaluate the environments over 30 episodes and compute the average rewards for the cooperative and defector agents.