Efficient Stacking and Destacking of Objects on Shelves to Facilitate Mechanical Search

Huang Huang, Letian Fu, Michael Danielczuk, Chung Min Kim, Zach Tam, Jeffrey Ichnowski, Anelia Angelova, Brian Ichter, and Ken Goldberg

Bluction Tool

Occupancy Distribution:

The occupancy distribution is a continuous function represented by a heatmap in image space, where each pixel’s value represents its probability of occluding the target object (i.e., likelihood that the pixel would be part of the target object if there were no occluding objects in the scene). We train a perception model predicting the occupancy distribution given the depth image and the target aspect ratio as in [1]. The model is trained on simulated images with the ground truth occupancy distributions calculated using Minkowski sum for the corresponding target aspect ratios as shown below [1]. The learned occupancy distribution therefore encodes the geometries of visible objects and the perspective effect of the camera, where larger objects that are placed closer to the camera cause more occlusion than smaller objects placed closer to the walls of the shelf. The training pipeline is shown below.

Sensitivity Experiments

Sensitivity experiments results with different target aspect ratios and different visibility threshold are shown in Table 4 and Table 5, respectively. The experiments are run over 600 scenes with 6 to 16 objects. Success rate (SR) and steps taken, shown as median (first quartile, third quartile) are reported.


Action differences in (A) DARSS and (B) MCTSSS on the same simulated shelf environment are shown above. Although the 1D target occupancy distribution appears very similar in the first step, DARSS first explores the largest peak in the occupancy distribution, splitting it into two parts while MCTSSS (rearrangement action) instead chooses to remove one of the peaks in the occupancy distribution. Although DARSS reveals a partial view of the target on step 11, it is unable to solve this scene due to multi-blockage; the greedy policy is unable to displace the front-most objects blocking the action with the object of the highest support. The policy then takes actions based on the noise in the distribution. On the other hand, MCTSSS benefits from lookahead and reveals the target object.

DARSS v.s. DARSS(--stack)

Below is an example of DARSS and DARSS(--stack)

A) Initial shelf environment (B) Shelf configuration resulting from DARSS (--stack), no feasible actions can be found due to insufficient floor space, episode terminates. (C) Shelf configuration resulting from DARSS, which successfully reveals the target object. This example suggests that stacking actions are critical to clear shelf space for future actions.