GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning

Yaru Niu1Shiyu Jin* 2,  Zeqing Zhang* 2,3Jiacheng Zhu1Ding Zhao1Liangjun Zhang2

1Carnegie Mellon University, 2Baidu Research, 3The University of Hong Kong

* Equal contributions

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023 

paper     2-min talk     5-min talk

Abstract

Scooping is an instinctive skill for humans to acquire, but liquid (e.g., water) scooping has not been explored in the field of robotics.  In this work, we first formulate the problem of goal-conditioned robotic water scooping with reinforcement learning. This task is challenging due to the complex dynamics of fluid and multi-modal goal-reaching. The policy is required to achieve both position goals and water amount goals, which leads to a large convoluted goal state space. To address these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum through the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently generalize to noisy real-robot water-scooping scenarios with different physical configurations and unseen settings, demonstrating superior efficacy and generalizability.

 Our goal-conditioned water scooping tasks. The task is randomly initialized over different water states (i.e., waterlines and fluctuations in the tank), different targeted water amounts and targeted positions (shown as a small white box). Our method can scoop the water to the targeted place with a small amount error using different containers in simulation, and can generalize well to real-robot scooping under various configurations.

The process of position goal sampling adaptation and the amount goal sampling adaptation. Here, diamonds on the left are samples from the desired, interpolation, or initial distributions.

Simulation Results

Example 1: Only change the position goal, with a water amount goal of 60% in the container 

Position 1:

Achieved amount: 62.02%

Position 2:

Achieved amount: 55.33%

Position 1:

Achieved amount: 61.54%

Position 2:

Achieved amount: 59.48%

Example 2: Only change the amount  goal

Amount goal 60%

Achieved amount: 60.0%

Amount goal 80%

Achieved amount: 79.32%

Amount goal 60%

Achieved amount: 57.23%

Amount goal 80%

Achieved amount: 74.02%

Example 3: Only change the initial waterline, with a water amount goal of 70% in the container 

Initial waterline low

Achieved amount: 67.72%

Initial waterline high

Achieved amount: 67.60%

Initial waterline low

Achieved amount: 67.92%

Initial waterline high

Achieved amount: 70.98%

Real World Experiment Setup

Setup Overview

Setup Overview

Setup Side view

Sim-to Real Transfer

Amount goal: 70%, achieved amount: 66.99%

Training the policy in simulation with limited velocity and acceleration

Amount goal: 70%, achieved amount: 71.79%

Generating trajectories by the trained policy for real-robot scooping

Real-Robot Results

Example 1: Only change the amount  goal  (initial waterline: 8cm, goal position: P2, initial height: 23cm)

Amount goal 60%

Achieved amount: 62.76%

Amount goal 70%

Achieved amount: 66.69%

Example 2: Only change the initial waterline  (amount goal: 60%, goal position: P2, initial height: 23cm)

Initial waterline: 7.5cm

Achieved amount: 63.07%

Initial waterline: 8.5cm

Achieved amount: 65.92%

Example 3: Only change the goal position  (amount goal: 65%, initial waterline: 8.5cm, initial height: 30cm -> unseen in training)

Position goal: P1

Achieved amount: 69.85%

Position goal: P3

Achieved amount: 65.92%

Example 4: Only change the initial position  (amount goal: 65%, initial waterline: 8.5cm, position goal: P2)

Initial height: 30cm (unseen in training)

Achieved amount: 61.99%

Initial height: 40cm (unseen in training)

Achieved amount: 60.29%