Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization