Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing

Max Yang, Yijiong Lin, Alex Church, John Lloyd, 

Dandan Zhang, David A.W. Barton*, Nathan F. Lepora*

Department of Engineering Mathematics and Bristol Robotics Laboratory, University of Bristol, Bristol BS8 1UB, U.K. 

email: {max.yang, david.barton, n.lepora}@bristol.ac.uk

Abstract

Object pushing presents a key non-prehensile manipulation problem that is illustrative of more complex robotic manipulation tasks. Characterized by a partially observable system with difficult-to-model physics, developing a system for general object pushing remains an unsolved challenge. While deep reinforcement learning (RL) methods have demonstrated impressive learning capabilities using visual input, a lack of tactile sensing limits their capability for fine control during manipulation. Here we propose a deep RL approach to object pushing using tactile and without visual input, namely tactile pushing. We present a goal-conditioned formulation that allows both model-free and model-based RL to obtain accurate policies for pushing an object to a goal. To achieve real-world performance, we adopt a sim-to-real approach. Our results demonstrate that it is possible to train on a single object and a limited sample of goals to produce precise and reliable policies that can generalize to a variety of unseen scenarios without domain randomization. We experiment with the trained agents in harsh pushing conditions, and show that with significantly more training samples, a model-free policy can outperform a model-based planner, generating shorter and more reliable pushing trajectories despite large disturbances. The simplicity of our training environment and effective real-world performance highlights the value of rich tactile information for fine manipulation.  

Tactile Reinforcement Learning Pipelines

Tactile Observations: Tactile images provide a general way to encode detailed contact features. Tactile-derived contact surface pose (tactile pose) can be more effective at representing pushing-related features. We compare the effectiveness of the two tactile observations for RL. 


Reinforcement Learning Methods:  We also compare the online performance of a model-free policy learned offline (model-free RL) against an online planner that plans with a model learned offline (model-based RL) for this task, with particular interests in the generalizability of each RL agent for novel pushing scenarios not seen during training. 

Simulation Experiments

Training Environment

Generalized Objects for Learned Policy

We train the RL agents to push a cube, on a limited set of goals, and we test the final policy on a range of novel objects and goals not seen during training. The success rate is shown in the table for each agent. 

Sim-to-Real Workflow

We train different observation models to bridge the sim-to-real gap for each type of tactile observation. 

Real-World Experiment Video