The FB-SSEM dataset is a synthetic curated using the Unity game engine. It consists of surround-view fisheye camera images and BEV height and segmentation maps generated from the sequential motion of an ego car along with the corresponding ego-motion information.


The UW IS Occluded dataset is curated using commodity hardware to reflect real-world scenarios with different environmental conditions and degrees of object occlusion. The dataset consists of RGB-D images, ground truth instance segmentation masks, and 6D object pose information.


The UW IS dataset comprises scene images from two different environments, namely, a living room and a mock warehouse. The scenes are captured using varying camera poses under different illumination conditions and include up to five different objects from a given set of fourteen objects. The dataset consists of RGB images and ground truth semantic segmentation masks. 

The UW IOM dataset comprises RGB videos of twenty individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights. Every frame is annotated with an action label following a four-tier hierarchy. The first tier indicates the type of object being manipulated, the second tier denotes the type of human motion, the third tier captures the type of object manipulation (if applicable), and the fourth tier represents the relative height of the surface where manipulation is taking place.