Talk Date and Time: October 3, 2023 at 1:00 pm - 1:45 pm EST followed by 15 minutes of Q&A on Google Meet
Topic: Learning Curricula in Open-Ended Worlds
Abstract:
Deep reinforcement learning (RL) agents commonly overfit to their training environments, performing poorly when the environment is even mildly perturbed. Such overfitting can be mitigated by conducting domain randomization (DR) over various aspects of the training environment in simulation. However, depending on implementation, DR makes potentially arbitrary assumptions about the distribution over environment instances. In larger environment design spaces, DR can become combinatorially less likely to sample specific environment instances that may be especially useful for learning. Unsupervised Environment Design (UED) improves upon these shortcomings by directly considering the problem of automatically generating a sequence or curriculum of environment instances presented to the agent for training, in order to maximize the agent's final robustness and generality. UED methods have been shown, in both theory and practice, to produce emergent training curricula that result in deep RL agents with improved transfer performance to out-of-distribution environment instances. Such autocurricula are promising paths toward open-ended learning systems that become increasingly capable by continually generating and mastering additional challenges of their own design. This talk provides a tour of recent algorithmic developments leading to successively more powerful UED methods, followed by a discussion of key challenges and potential paths to unlocking their full potential in practice.
Bio:
Minqi is a researcher at Meta AI and recently received his PhD from the University College London advised by Tim Rocktaschel and Edward Grefenstette. He is especially interested in problems at the intersection of generalization, human-AI coordination, and open-ended systems. Minqi has developed or contributed to most of the seminal methods in Unsupervised Environment Design and his curriculum learning algorithm, Prioritized Level Replay, was used to train adaptive agents in DeepMind's XLand 2 environment. Minqi has also organized multiple workshops focused on Agent Learning in Open Endedness including the upcoming workshop at NeurIPS.