Learning from Visual Observation via 

Offline Pretrained State-to-Go Transformer