One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments
Yufei Wang, Zhanyi Sun, Zackory Erickson*, David Held*
Robotics: Science and Systems (RSS), 2023
*Equal Advising
Abstract
Robot-assisted dressing could benefit the lives of many people such as older adults and individuals with disabilities. Despite such potential, robot-assisted dressing remains a challenging task for robotics as it involves complex manipulation of deformable cloth in 3D space. Many prior works aim to solve the robot-assisted dressing task, but they make certain assumptions such as a fixed garment and a fixed arm pose that limit their ability to generalize. In this work, we develop a robot-assisted dressing system that is able to dress different garments on people with diverse poses from partial point cloud observations, based on a learned policy. We show that with proper design of the policy architecture and Q function, reinforcement learning (RL) can be used to learn effective policies with partial point cloud observations that work well for dressing diverse garments. We further leverage policy distillation to combine multiple policies trained on different ranges of human arm poses into a single policy that works over a wide range of different arm poses. We conduct comprehensive real-world evaluations of our system with 510 dressing trials in a human study with 17 participants with different arm poses and dressed garments. Our system is able to dress 86% of the length of the participants' arms on average.
RSS Presentation Video
Real-world Human Study
All videos are 4x real-time. All dressing trials are performed by a single policy using partial point cloud observations.
Generalization to different poses
Our policy is able to generalize to different arm poses of different participants.
Generalization to different garments
Our policy is able to generalize to dress 5 different garments on different participants.
Comparison to baseline
The baseline policy, which does not use policy distillation to generalize to diverse poses, has difficulty correctly turning at the eblow and dressing the upper arm in most trials.
Out of distribution evaluation -- arm moving
Although our system makes the assumption that the person holds the arm static, we find our system to be robust to small arm movements of the participants during the dressing process.
In the 4 videos below, the participant does some constant arm motion during the dressing process. Our system can still successfully perform the dressing with the following 4 different kinds of arm motions. Videos are 2x real speed.
The participant constantly moves her forearm horizontally during dressing.
The participant constantly moves her forearm up and down during dressing.
The participant constantly moves her forearm in a spherical motion during dressing.
The participant constantly moves her shoulder up and down during dressing.
The following 4 videos show that the participant does a one-time arm movement after we record the initial point cloud. The initial arm pose is overlaid on the video with more transparency. Our system fails in the last trial when the joint angle change is too much, while succeeding in the other three. Videos are 4x real speed.
The participant bends her elbow downwards after the initial arm point cloud is captured.
The participant bends her elbow inwards with a small angle after the initial arm point cloud is captured.
The participant bends her shoulder downwards after the initial arm point cloud is captured.
The participant bends her elbow inwards with a large angle after the initial arm point cloud is captured. The policy fails when the joint angle change is too much.
Visualizations of policy: input point clouds and output actions
For each video, the left half shows the images from the RealSense camera, which we use to record the depth images of the scene. The right half shows the policy input point clouds and the output translation action (the rotation part of the action is not shown).
All videos are composed using stored images and are not in real time.
In this trial, the color thresholding of the garment misses some of the garment points due to the lighting change of the garment (when the garment is on the upper arm). Therefore, these garment points are not present in the policy observation. Interestingly, though our policy is not explicitly trained to be robust to such errors, it still performs the dressing task well.
Failure case in the human study
This dressing trial fails because the participant moved his/her arm after we captured his/her arm point cloud. Since we use the captured static arm point cloud as the policy observation, as a result, the policy observes the arm at the wrong location. Although from the left half of the video we see that the garment is on the arm, but for the policy observation shown on the right half of the video, the garment is not on the arm, instead, it is on the right side of the arm. Such a case is totally out of our training distribution and thus the policy is not able to output meaningful actions that make further progress of the task.
The policy turns too sharp at the elbow and the garment gets stuck at the elbow (the policy has already pulled the garment behind the elbow point when making the turn; however, the sleeve of the garment is too long and the policy needs to further pull the garment forward to fully turn the whole sleeve to the upper arm). From the policy observation we notice that after the policy turns at the elbow, the garment points around the elbow seem to be missing. This is likely caused by the error in color thresholding, e.g., the robot arm is casting some shadow on the garment part that is around the elbow, and thus the pre-defined color thresholding values fail to capture those garment points. This could be why the policy is turning too early and not responsive to the garment getting caught at the elbow issue, as it actually does not have the correct observation to inform it such an issue is happening.
Simulation Video
In simulation, we leverage reinforcement learning and policy distillation to learn a single policy that is able to work with diverse arm pose ranges, body sizes & shapes, and 5 different garments. The gifs below show our policy's dressing performance in simulation.
Out of distribution evaluation -- arm moving
In simulation, we evaluate when the person does a one-time movement of the arm after we record the initial arm point cloud. The movement is quantified as the joint angle changes at shoulder and elbow. We test three joint angle changes: lowering down the shoulder, lowering down the elbow, and bending inwards the elbow, which are three common unconscious movements we observe in our real-world human study.
The figures below show how the upper arm dressed ratio changes when the joint angle change increases. The first plot shows how the upper arm dressed ratio change with respect to shoulder joint bending dowards, the second with respect to elbow joint bending downwards, and the third with respect to elbow joint bending inwards. Each colored line shows the performance of one garment.
We find our system to be robust to 8.6 degrees of change in joint angles (averaged across three types of joint angle changes and 5 garments) while maintaining 75% of the original upper arm dressed ratio.
Generalization to dual-arm dressing
With the same single-arm dressing assumptions (static arm pose and robot already grasping the garment), we show that our method generalizes to dual-arm dressing as well. Our Dense Transformation policy can handle controlling of two robotic arms well by simply extracting actions corresponding to both of the robot gripper points.
In simulation, we successfully train two indivisual policies to perform dual arm dressing of a hospital gown (left) and a cardigan (right) on a fixed pose.
Bibtex
@inproceedings{Wang2023One,
title={One Policy to Dress Them All: Learning to Dress People with Diverse Poses and Garments},
author={Wang, Yufei and Sun, Zhanyi and Erickson, Zackory and Held, David},
booktitle={Robotics: Science and Systems (RSS)},
year={2023}}