Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

Johannes Ackermann , Takayuki Osa , Masashi Sugiyama

The University of Tokyo, RIKEN AIP

TL;DR: We investigate Offline RL with datasets that include a gradually evolving non-stationarity.

Update: Accepted for publication at the Reinforcement Learning Conference 2024!

Paper: https://arxiv.org/abs/2405.14114
Code: https://github.com/JohannesAck/OfflineRLStructuredNonstationarity
Personal Website: https://johannesack.github.io/

We investigate an Offline RL setting in which the environment changes during the dataset collection. This can occur for example due to wear-and-tear, or evolving preferences, etc.

To tackle this setting we make the structural assumption of stationarity during each episode, but change between episodes. We can then formulate our setting as a DP-MDP, a HiP-MDP in which the new HiP depends on previous HiPs.

We then evaluate multiple ways to infer this HiP and find that Dynamic-VAE based methods do not perform sufficiently well. We thus develop a method based on contrastive predictive coding, that infers the hidden-parameter in the dataset.

We use this model to relabel our dataset and train a policy and latent predictor.

We then evaluate our method on a set of continuous control tasks, visualizing the learned latent representation and evaluating the policy reward.

Page updated

Google Sites

Report abuse