GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning
Yaru Niu1, Shiyu Jin* 2, Zeqing Zhang* 2,3, Jiacheng Zhu1, Ding Zhao1, Liangjun Zhang2
1Carnegie Mellon University, 2Baidu Research, 3The University of Hong Kong
* Equal contributions
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023
Abstract
Scooping is an instinctive skill for humans to acquire, but liquid (e.g., water) scooping has not been explored in the field of robotics. In this work, we first formulate the problem of goal-conditioned robotic water scooping with reinforcement learning. This task is challenging due to the complex dynamics of fluid and multi-modal goal-reaching. The policy is required to achieve both position goals and water amount goals, which leads to a large convoluted goal state space. To address these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum through the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently generalize to noisy real-robot water-scooping scenarios with different physical configurations and unseen settings, demonstrating superior efficacy and generalizability.
Our goal-conditioned water scooping tasks. The task is randomly initialized over different water states (i.e., waterlines and fluctuations in the tank), different targeted water amounts and targeted positions (shown as a small white box). Our method can scoop the water to the targeted place with a small amount error using different containers in simulation, and can generalize well to real-robot scooping under various configurations.
The process of position goal sampling adaptation and the amount goal sampling adaptation. Here, diamonds on the left are samples from the desired, interpolation, or initial distributions.
Simulation Results
Example 1: Only change the position goal, with a water amount goal of 60% in the container
Position 1: ✔
Achieved amount: 62.02%
Position 2: ✔
Achieved amount: 55.33%
Position 1: ✔
Achieved amount: 61.54%
Position 2: ✔
Achieved amount: 59.48%
Example 2: Only change the amount goal
Amount goal 60%
Achieved amount: 60.0%
Amount goal 80%
Achieved amount: 79.32%
Amount goal 60%
Achieved amount: 57.23%
Amount goal 80%
Achieved amount: 74.02%
Example 3: Only change the initial waterline, with a water amount goal of 70% in the container
Initial waterline low
Achieved amount: 67.72%
Initial waterline high
Achieved amount: 67.60%
Initial waterline low
Achieved amount: 67.92%
Initial waterline high
Achieved amount: 70.98%
Real World Experiment Setup
Setup Overview
Setup Overview
Setup Side view
Sim-to Real Transfer
Amount goal: 70%, achieved amount: 66.99%
Training the policy in simulation with limited velocity and acceleration
Amount goal: 70%, achieved amount: 71.79%
Generating trajectories by the trained policy for real-robot scooping
Real-Robot Results
Example 1: Only change the amount goal (initial waterline: 8cm, goal position: P2, initial height: 23cm)
Amount goal 60%
Achieved amount: 62.76%
Amount goal 70%
Achieved amount: 66.69%
Example 2: Only change the initial waterline (amount goal: 60%, goal position: P2, initial height: 23cm)
Initial waterline: 7.5cm
Achieved amount: 63.07%
Initial waterline: 8.5cm
Achieved amount: 65.92%
Example 3: Only change the goal position (amount goal: 65%, initial waterline: 8.5cm, initial height: 30cm -> unseen in training)
Position goal: P1
Achieved amount: 69.85%
Position goal: P3
Achieved amount: 65.92%
Example 4: Only change the initial position (amount goal: 65%, initial waterline: 8.5cm, position goal: P2)
Initial height: 30cm (unseen in training)
Achieved amount: 61.99%
Initial height: 40cm (unseen in training)
Achieved amount: 60.29%