Offline Meta-Reinforcement Learning with Advantage Weighting (MACAW)

arxiv | paper code | minimal code | video

ICML 2021

TL;DR: We introduce the offline meta-RL setting and propose a gradient-based algorithm that learns adaptation policies from purely offline data.