AutoBag: Learning to Open Plastic Bags and Insert Objects

Lawrence Yunliang Chen, Baiyu Shi, Daniel Seita,

Richard Cheng, Thomas Kollar, David Held, Ken Goldberg


IEEE International Conference on Robotics and Automation (ICRA), 2023


Abstract

Thin plastic bags are ubiquitous in retail stores, healthcare, food handling, recycling, homes, and school lunchrooms. They are challenging both for perception (due to specularities and occlusions) and for manipulation (due to the dynamics of their 3D deformable structure). We formulate the task of "bagging:" manipulating common plastic shopping bags with two handles from an unstructured initial state to an open state where at least one solid object can be inserted into the bag and lifted for transport. We propose a self-supervised learning framework where a dual-arm robot learns to recognize the handles and rim of plastic bags using UV-fluorescent markings; at execution time, the robot does not use UV markings or UV light. We propose the AutoBag algorithm, where the robot uses the learned perception model to open a plastic bag through iterative manipulation. We present novel metrics to evaluate the quality of a bag state and new motion primitives for reorienting and opening bags based on visual observations. In physical experiments, a YuMi robot using AutoBag is able to open bags and achieve a success rate of 16/30 for inserting at least one item across a variety of initial bag configurations. 

The "Bagging" Task

Task - Human_bagging_better.mp4

In this work, we study “bagging,” which is a common task that we as humans do every day: Open a plastic bag from an unstructured configuration, insert objects, and lift it up for transport.

Challenges of Plastic Bag Manipulation

While the "Bagging" task is easy for humans, it is very challenging for robots. As a deformable object, it can deform in many ways, leading to self-occlusions.

Challenge - Rotate illustration.mp4

It is also difficult to manipulate due to under-actuation where different parts of the bag move differently. As such, it is difficult to reorient the bag.

Challenge - Grasp_height_illustration_combined.mp4

A one-millimeter difference in the gripper height can also significantly affect grasping.

Representation of Bag: Semantic Segmentation


We propose semantic segmentation of plastic bags:   

UV-Fluorescent Paint for Self-Supervised Data Collection


To learn this perception model, we paint the handles and rim region of our training bags with UV-fluorescent markings. The bag looks normal under regular lighting conditions. But when the UV lights are turned on, the UV paints glow their unique colors, which allows the robot to collect ground-truth segmentation labels without human annotations.

We use a self-supervised data collection procedure for training a segmentation model using UV-fluorescent markings to avoid time-consuming and expensive human annotation.

We place 6 programmable UV LED lights overhead and paint the 2 key parts of bags (handles and rim) with transparent UV-fluorescent paints that brightly reflect 2 different colors under UV light. 

When the UV lights are turned off, the paints are invisible and the bag looks normal under regular lighting. When the regular lights are turned off and the UV lights are turned on, everything is dark except for the regions with UV paints, which glow their unique colors. 

Action Primitives for Self-Supervised Data Collection

With this setup, the robot uses its action primitives to manipulate the bag into different configurations, and by alternating the lighting conditions, the RGBD camera collects paired images of the bag in both standard and UV lighting.

By extracting the segmentation masks from the image under the UV lights through color thresholding, the system obtains the ground truth segmentation labels corresponding to the image of the bag under regular lighting conditions, which are then used to train the segmentation network.

Data Collection in Action

Data Collection - 10x.mp4

In this self-supervised data collection process, the robot systematically explores the bag state space by sampling a set of primitive actions to manipulate the bag into diverse configurations.

Perception Module Training

We use the collected data to train a semantic segmentation model that recognizes the bag rim and handles at execution time without UV lights.

2 Novel Metrics for Quantifying the Bag Opening: Convex Hull Area and Elongation

The convex hull area is correlated to the size of the bag opening.  However, we also observe that for inserting items, a sideways-facing bag with a closed opening is worse than an upward-facing opening which is small but rounded. The normalized convex hull area, however, may give higher values to the former. Thus, we also propose convex hull elongation which measures the aspect ratio of the bag opening.

AutoBag Algorithm

Physical Experiments

We evaluate AutoBag on 2 Bags, one used in perception model training, the other unseen in training

Goal: Insert 2 spray bottles

We consider 3 tiers of initial bag configurations:

Tier 1: The bag starts upward-facing with the rim recognizable but with a small opening. This requires enlarging the bag opening to allow placing objects inside.

Tier 2: The bag starts at an expanded, slightly wrinkled state lying sideways on the workspace. This requires reorienting the bag upwards and then opening the bag.

Tier 3: Any other, more complex initial configuration.

Successful Examples on the training bag

AutoBag - Train Bag Tier 1 Side View - 10x.mp4

AutoBag (Tier 1)

AutoBag - Train Bag Tier 2 Side View - 10x.mp4

AutoBag (Tier 2)

AutoBag - Train Bag Tier 3 Multi-View - 10x.mp4

AutoBag (Tier 3)

Successful Examples on the test bag

AutoBag - Test Bag Tier 1 - 10x.mp4

AutoBag (Tier 1)

AutoBag - Test Bag Tier 2 - 10x.mp4

AutoBag (Tier 2)

Baselines

Pin-Pull with Sideways Insertion

We compare with a baseline that attempts to insert objects from the side. A prerequisite of this method is to be able to grasp only the top layer of the bag. For this, we apply the Pin-Pull primitive to the opening of the bag. In particular, we choose the pin position as the center of the rim farther from the bag center, and the pull position as the midpoint between the center of the rim closer to the bag center and the bag center. 

The intention is to pin the bottom layer of the bag while pulling the top layer of the bag in order to separate the two layers of the bag and create a sideways-facing opening. The robot then attempts to insert the object from the side, regardless of whether the two layers have been truly separated or not, since the overhead camera cannot tell this. 

The videos below illustrate this method. It turns out that grasping only the top layer of the bag is very difficult, even with the Pin-Pull primitive. This method only works when there is a large spatial separation between the top layer and the bottom layer in order to leave room for the pin hand to pin the bottom layer without affecting the top layer.

Baseline 3 - pin pull success 2 - 5x.mp4

Success (Rare)

It requires two layers to have very large initial spatial separation.

Baseline 3 - pin pull failure - 5x.mp4

Failure (Almost Always)

The gripper either grasps two layers or misses the grasp.

Fling the Bag with Handles

We also compare with a method that performs insertion while the other hand holds a bag handle.  The robot first identifies and grasps the two handles, one in each gripper. It then lifts the bag in midair and performs two sequences of dynamic actions. It first shakes the bag in the horizontal left-right direction 2 times. This has the effect of separating the two layers of the rim to prevent them from sticking to each other. The robot then flings the bag vertically 3 times. This action allows the air to come into the bag opening and inflate the bag. Then, the robot’s right gripper releases the handle to grasp the object. Finally, the robot inserts the object at the center of the estimated bag opening while the other gripper still holds the bag.

There are two main challenges with this method. (1) It is pretty difficult to open the bag, and (2) the bag tilts away as soon as one gripper releases the handle to grasp the objects, leaving little room for the robot to insert the object from the top.

Baseline 4 - Handle Fling Not Open Bag - 5x.mp4

Grasping the handles and flinging the bag also often fails to create an opening due to two reasons: (1) the velocity, acceleration, and jerk of the fling actions are limited by the robot, and (2) grasping two layers of the handle prevents the bag opening to be inflated.

Baseline 4 - Handle Fling Better - 5x.mp4

Even when the gripper grasps one layer of the bag handles, the opening closes as soon as one gripper releases the handle, leaving little room for the robot to insert the object from the top (inserting the objects along the angle of the tilted opening could be helpful, but this may require more than an overhead camera to make the insertion direction very precise).

Failure Modes

Failure Mode - Bag stuck in gripper.mp4

Bag in irrecoverable position

Failure Mode - Object Fall Out During Lifting.mp4

Objects placed outside opening

Failure Mode - Place outside Opening.mp4

Objects fall out when lifting

Failure Mode - Slip Grasp During Lifting.mp4

Slipped grasp during lifting

Another common failure mode that is not shown in the above videos is timeout, which is caused by both manipulation and perception challenges during the long sequences of manipulation steps. In particular, during manipulation, the gripper may miss the grasp if the grasp region is too flat on the surface to have enough friction, or the bag may slip out of the gripper during dynamic actions. Additionally, the perception module is not always robust and may sometimes only recognize part of the rim. This leads to the convex hull approximation of the opening to underestimate the true opening and causes the robot to perform more Stage 1 and Stage 2 actions than necessary. While this conservatism is not fatal on its own, the increased action steps leads to a larger chance of failure rate due to imperfect manipulation, as the opening can easily close again during manipulation. We plan to address these in our future work.

Acknowledgements

This research was performed at the AUTOLAB at UC Berkeley in affiliation with the Berkeley AI Research (BAIR) Lab, and the CITRIS "People and Robots" (CPAR) Initiative. The authors were supported in part by donations from Toyota Research Institute.  Lawrence Yunliang Chen is supported by the National Science Foundation (NSF) Graduate Research Fellowship Program under Grant No. 2146752. Daniel Seita and David Held are supported by NSF CAREER grant IIS-2046491. We thank our colleagues who gave us feedback and helped with experiments, in particular Justin Kerr and Roy Lin.