Efficiently Learning Single-Arm Fling Motions to Smooth Garments

Lawrence Yunliang Chen*, Huang Huang*, Ellen Novoseller, Daniel Seita, Jeffrey Ichnowski, Michael Laskey, Richard Cheng, Thomas Kollar, Ken Goldberg

Abstract:

Recent work has shown that 2-arm "fling" motions can be effective for garment smoothing. We consider single-arm fling motions. Unlike 2-arm fling motions, which require little robot trajectory parameter tuning, single-arm fling motions are sensitive to trajectory parameters. We consider a single 6-DOF robot arm that learns fling trajectories to achieve high garment coverage. Given a garment grasp point, the robot explores different parameterized fling trajectories in physical experiments. To improve learning efficiency, we propose a coarse-to-fine learning method that first uses a multi-armed bandit (MAB) framework to efficiently find a candidate fling action, which it then refines via a continuous optimization method. Further, we propose novel training and execution-time stopping criteria based on fling outcome uncertainty. Compared to baselines, we show that the proposed method significantly accelerates learning. Moreover, with prior experience on similar garments collected through self-supervision, the MAB learning time for a new garment is reduced by up to 87%. We evaluate on 6 garment types: towels, T-shirts, long-sleeve shirts, dresses, sweat pants, and jeans. Results suggest that using prior experience, a robot requires under 30 minutes to learn a fling action for a novel garment that achieves 60-94% coverage.

Summary Video

Illustration of the reset, shaking, and flinging actions.

ISRR_Fling_Summary_Video.mp4

6 Test Garments

Left to right: A blue towel, a black T-shirt, a gray long-sleeve shirt, a pair of blue jeans, a pair of white sweat pants, and a white dress.

Reset Procedure

Joints are numbered 0 through 5. The first part (left) lifts to configuration q_up. The second part (right) lowers to q_down to crumple the garment. See Section 4.1 in the paper for details.

Variation in the garment's shape when hanging

The figures on the right show examples of a T-shirt's differing stable hanging positions at the same grasp point and rotation angle (i.e., the gripper never opens and stays at the same orientation). This demonstrates the variability of the T-shirt's possible configurations and suggests that dynamic flinging is highly nondeterministic. This large aleatoric uncertainty leads to large statistical noises, making the learning challenging.

Example Behavior of Non-Optimized Fling Actions

From the following two videos, we can see that neither flinging too fast (left) nor flinging too slow (right) is good.

black_tshirt_front_fastbad.mp4
black_tshirt_front_slowbad.mp4

Repeated execution of the learned fling action for each of the garments

The following videos show the robot repeatedly executing the best-learned fling action for 3 test garments. The shaking actions are not shown, but they are executed and lead to different hanging states. The videos illustrate the level of aleatoric uncertainties associated with dynamic actions as there are significant differences in the result of the fling action even though the action is the same.

blue_towel_front_wo_rotation.mp4
black_tshirt_front_wo_rotation.mp4
gray_shirt_side_wo_rotation.mp4
light_jeans_front.mp4
gray_pants_front.mp4
white_dress_front.mp4

Comparison with Human

Below are experiment results comparing human performance with the robot executing the best learned action on each of the 6 test garments. For robot, the same action is repeated 20 times and the average is taken. For humans, they each perform fling actions 10 times for each garment.

Human Experiment Protocol

For each human subject, we asked them to grasp the same position as the robot and perform the shaking motion to perturb the state. The subjects are then allowed to perform a fling action to smooth the garment using one arm with any fling action they prefer as long it is a coherent movement while grasping the garment. If the garments fall outside the workspace (same as the robot), the current trial is discarded and the human need to redo the trial. For each garment, the human is instructed to fling the garment 10 times and the coverage after each fling motion is recorded. The human subject can request a rest anytime during the experiments.

The first 6 tables show the mean and std of the garment coverage where the first row is the robot performance and the rest 5 rows are subject 1 to 5. The next 6 plots are the corresponding bar plots. The results show the robot can achieve similar or better performance to that of humans. Also, human performance has a much larger variance.