Reinforcement Learning (RL)-based methods have garnered significant attention in the field of robot learning, with efficient exploration of state spaces being a key factor for the success of these tasks. However, many recent RL approaches face substantial challenges related to sample and learning efficiency, often struggling with insufficient exploration in environments characterized by large and complex state spaces. Additionally, reward engineering remains a pervasive issue, particularly in goal-oriented tasks with sparse external rewards.
To address these challenges, we propose a novel exploration framework called Latent State Predictive Exploration (LSPE). To efficiently handle high-dimensional visual observations in complex environments, we introduce a state encoder that learns a compact representation within the latent space, effectively filtering out irrelevant or noisy information from the observations.
Moreover, we incorporate a self-predictive network that integrates temporal information into the state encoder, further stabilizing and enriching the learned representation, as well as enhancing predictive control for the robot during the exploration phase.
Furthermore, we introduce an Exploration Reward Function (ERF) that encourages the robot agent to explore the latent space, thereby improving state space exploration and enabling scalability to high-dimensional environments.Through experiments across eight challenging navigation and manipulation tasks, we demonstrate that LSPE is both effective and scalable in complex, high-dimensional environments. Notably, our approach can explore a variety of useful behaviors, even in unsupervised settings.