STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots

Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox

University of Washington, NVIDIA

arxiv    poster   code   data


Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangements, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects beyond their training sets, thereby requiring the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods, during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as table rearrangement. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. Furthermore, we propose a novel paradigm for joint segmentation and tracking in discrete frames, alongside a transformer module that facilitates efficient inter-frame communication. Our approach significantly outperforms recent methods in our experiments.


110_stow_discrete_frame_segmentati-Poster Spotlight Video.mp4

Method overview


Real-Robot Experiment


The video above demonstrates the stowing and picking process in action. In our quantitative real-world robot evaluation, we task the robot with picking 82 objects, involving over 100 distinct objects, either as target items or distractors.

With the baseline method UCN+SIFT, there is a 40.2% chance of successfully grasping the target object. Using finetuned VITA, the success rate is 46.3%. By incorporating our STOW method, the grasping success rate increased to 74,4%.

The image on the left depicts a typical test environment used in our evaluation.


      title={STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots}, 

      author={Yi Li and Muru Zhang and Markus Grotz and Kaichun Mo and Dieter Fox},