RPM: Generalizable Behaviors in Multi-Agent Reinforcement Learning