Offline Reinforcement Learning from Datasets with Structured Non-Stationarity
Johannes Ackermann , Takayuki Osa , Masashi Sugiyama
The University of Tokyo, RIKEN AIP
Johannes Ackermann , Takayuki Osa , Masashi Sugiyama
The University of Tokyo, RIKEN AIP
TL;DR: We investigate Offline RL with datasets that include a gradually evolving non-stationarity.
Update: Accepted for publication at the Reinforcement Learning Conference 2024!
Paper: https://arxiv.org/abs/2405.14114
Code: https://github.com/JohannesAck/OfflineRLStructuredNonstationarity
Personal Website: https://johannesack.github.io/
We investigate an Offline RL setting in which the environment changes during the dataset collection. This can occur for example due to wear-and-tear, or evolving preferences, etc.
To tackle this setting we make the structural assumption of stationarity during each episode, but change between episodes. We can then formulate our setting as a DP-MDP, a HiP-MDP in which the new HiP depends on previous HiPs.
We then evaluate multiple ways to infer this HiP and find that Dynamic-VAE based methods do not perform sufficiently well. We thus develop a method based on contrastive predictive coding, that infers the hidden-parameter in the dataset.
We use this model to relabel our dataset and train a policy and latent predictor.
We then evaluate our method on a set of continuous control tasks, visualizing the learned latent representation and evaluating the policy reward.