Cultural Transmission Videos

Agent paired with an expert visits goal locations in the correct order

We show randomly picked videos of an agent paired with an expert bot. The agent does about as well as the expert.

Agent paired with an anti-expert visits goal locations in a pessimal order

We show randomly picked videos of an agent paired with an anti-expert bot (that visits goal locations in a pessimal order). The agent continues to follow the partner, performing far worse than a random policy. Note that as the agent also receives rewards as observations, it's straightforward in principle for the agent to realize that it is visiting the wrong locations.

Agent paired with a randomly moving bot does not visit goal locations

We show randomly picked videos of an agent paired with a randomly moving bot. The agent continues to follow the partner, achieving about the same performance as random movement.

Agent alone in the environment does not visit goal locations

We show randomly picked videos of an agent without any partner bot present. The agent does about the same as when paired with a random bot. This is because in training, there is always an expert bot present for at least some fraction of the episode (before dropping out), so the agent has learned to expect a partner bot at the beginning of the episode.