CtrlFormer: Learning Transferable State Representation for
Visual Control via Transformer