RL from Cross-domain Videos with Video Prediction Models

XIPER enables RL agents to learn from unlabeled cross-domain videos, without ground-truth task reward and low-dimensional state information. XIPER works by learning a cross-domain video prediction model that consists of: (i) An expert video prediction model, trained to model expert behaviors, and (ii) a domain translation model, trained to map agent domain observations into the expert domain. Then using the likelihood of the prediction as the reward signal to train RL agents.

Code / Examples of Dataset / Pretrained Model

Sim2Real Analysis

Vid. in Experiments

Page updated

Google Sites

Report abuse