STGTransformer

Learning from Visual Observation via

Offline Pretrained State-to-Go Transformer

Bohan Zhou, Ke Li, Jiechuan Jiang, Zongqing Lu

PKU BAAI

【Github】【Arxiv】

Learning from Visual Observation

Learning from visual observation (LfVO), aiming at recovering policies from only visual observation data, is promising yet a challenging problem. Existing LfVO approaches either only adopt inefficient online learning schemes or require additional task-specific information like goal states, making them not suited for open-ended tasks.

STG for LfVO

We propose a two-stage framework for learning from visual observation. The first stage involves three concurrently pretrained components. A feature encoder is trained in a self-supervised manner to provide easily predicted and temporally aligned representations for stacked-image states. State-to-Go (STG) Transformer is trained in an adversarial way to accurately predict transitions in latent space. A discriminator is updated simultaneously to distinguish state transitions of prediction from expert demonstrations, which provides high-quality intrinsic rewards for downstream online reinforcement learning in the second stage.

State-To-Go Transformer

Built upon GPT, State-To-Go (STG) Transformer primarily focuses on predicting the next state embedding given a sequence of states. A additional self-supervised auxiliary module, named temporal distance regressor (TDR), with 1D attention, is devised to ensure the temporally-aligned visual embedding. A learned WGAN-based discriminator distinguish between expert and non-expert transitions without collecting online negative samples, providing an offline way to generate intrinsic rewards for a PPO agent to complete downstream reinforcement learning tasks.

STG for Atari Visual Control Tasks

Breakout

Freeway

Qbert

SpaceInvaders

STG for Open-Ended Minecraft Tasks

Pick a flower

Milk a cow

Harvest tallgrass

Gather wool

Empirical results on Atari and Minecraft demonstrate excellent capabilities in solving LfVO problems, which shed light on the potential of utilizing video-only data to solve difficult visual reinforcement learning tasks rather than relying on complete offline datasets containing states, actions, and rewards.

Citation

@article{zhou2024learning,

title={Learning from Visual Observation via Offline Pretrained State-to-Go Transformer},

author={Zhou, Bohan and Li, Ke and Jiang, Jiechuan and Lu, Zongqing},

journal={Advances in Neural Information Processing Systems},

volume={36},

year={2024}

}

Page updated

Google Sites

Report abuse