LSPE

Latent State-Predictive Exploration for Deep Reinforcement Learning

Yiming Wang¹, Kaiyan Zhao²,Borong Zhang¹, Yan Li³, Leong Hou U¹

¹State Key Laboratory of Internet of Things for Smart City, University of Macau

²School of Computer Science, Wuhan University

³School of Artificial Intelligence, Shenzhen Polytechnic University

PDF

Code

Bibtex

Abstract

Reinforcement Learning (RL)-based methods have garnered significant attention in the field of robot learning, with efficient exploration of state spaces being a key factor for the success of these tasks. However, many recent RL approaches face substantial challenges related to sample and learning efficiency, often struggling with insufficient exploration in environments characterized by large and complex state spaces. Additionally, reward engineering remains a pervasive issue, particularly in goal-oriented tasks with sparse external rewards.

To address these challenges, we propose a novel exploration framework called Latent State Predictive Exploration (LSPE). To efficiently handle high-dimensional visual observations in complex environments, we introduce a state encoder that learns a compact representation within the latent space, effectively filtering out irrelevant or noisy information from the observations.

Moreover, we incorporate a self-predictive network that integrates temporal information into the state encoder, further stabilizing and enriching the learned representation, as well as enhancing predictive control for the robot during the exploration phase.

Furthermore, we introduce an Exploration Reward Function (ERF) that encourages the robot agent to explore the latent space, thereby improving state space exploration and enabling scalability to high-dimensional environments.Through experiments across eight challenging navigation and manipulation tasks, we demonstrate that LSPE is both effective and scalable in complex, high-dimensional environments. Notably, our approach can explore a variety of useful behaviors, even in unsupervised settings.

Motivation

Exploration is crucial for the success of robotic control tasks. Many recent RL approaches face substantial challenges related to sample and learning efficiency, often struggling with insufficient exploration in environments characterized by large and complex state spaces. Additionally, reward engineering remains a pervasive issue, particularly in goal-oriented tasks with sparse rewards.

Numerical Comparison between LSPE and baseline

Visualized trajectoried of LSPE and baseline

Overview

In this work, we aim to address the aforementioned challenges by presenting a novel exploration framework that is both efficient and effective for robotic control tasks.

We develop an exploration framework (LSPE) for robotic control that captures the robot's diverse behaviors by maximizing exploration of the compact latent space.
We propose a novel state encoder that effectively filters out noise, enabling efficient handling of high-dimensional environments with visual observations.
We introduce a self predictive network that integrates temporal information into the encoder, further enhancing predictive control during exploration.
We evaluate LSPE's performance on eight challenging navigation and manipulation tasks with four competitive baselines, demonstrating higher learning efficiency, better performance, and improved scalability compared to other methods.

The states are projected into a compact latent space. By randomly sampling the unit latent vector, the agent is encouraged to span the latent space with directional movements, thus exploring the state space effectively and efficiently

Manipulation Results

Lift