ManiSkill-HAB Supplementary

Subtask Execution Videos

We provide the full videos for the 5-frame renders seen in Fig. 2.

pick.mp4

Pick

place.mp4

Place

open.mp4

Open (Drawer)

close.mp4

Close (Fridge)

Failure Mode Examples

These videos offer visual examples for some Pick and Place failure modes. The videos below do not cover all failure or success modes for either subtask.

Pick

pick_cant_grasp_failure.mp4

Can't grasp failure

pick_drop_failure.mp4

Drop failure

pick_mobility_failure.mp4

Mobility failure

pick_too_slow_failure.mp4

Too slow failure

Place

place_didnt_reach_goal_failure.mp4

Didn't reach goal failure

place_place_in_goal_failure.mp4

Place in goal failure

place_drop_to_goal_failure.mp4

Drop to goal failure

place_wont_let_go_failure.mp4

Won't let go failure

Emergent Behavior Example: Pick -> Drop -> Pick from floor

potted_meat_can_winding_success.mov

Comparison with M3

0_r0_c8_e0.mp4

Ours: Policy must pull Cracker Box out of sink without colliding with sink edge or other objects (otherwise box might fall).

m3_cluttered_grasp_cracker_box.mp4

M3: Policy hovers above cluttered receptacle and relies on magical grasp to lift (teleport) the target object. Video taken from https://sites.google.com/view/hab-m3

Comparing rendering performance between ManiSkill-HAB vs Behavior 1k

To compare performance, we run an altered version of Behavior-1k’s rendering benchmark. We use a single Nvidia RTX 4090, render 1 128x128 RGB-D image, and simulate dynamics with a simulation frequency of 120Hz and control frequency of 30Hz. Each evaluation run consists of 300 steps of random actions clipped to [-0.3, 0.3]. We report mean and 95% CIs over 10 evaluation runs.

While live-rendering with ray tracing, ManiSkill-HAB achieves 69.90 ±0.25 samples per second (SPS) while using 6.26 ±0.00 GB of GPU memory, while Behavior-1k is limited to 19.92 ±0.04 SPS while using 7.62 ±0.04 GB of GPU memory.

Hence, ManiSkill-HAB is 3.51x faster than Behavior-1k while using 17.85% less GPU memory, while also retaining similar ray-tracing render quality.

Below, we provide a comparison of live-rendered ray-traced images between ManiSkill-HAB (left) and Behavior-1k (right).

Live render from ManiSkill-HAB. Users need only change one line in the code.

Live render obtained from the Behavior-1k official Colab demo notebook.

Citation

@inproceedings{shukla2025maniskillhab,

author = {Arth Shukla and Stone Tao and Hao Su},

title = {ManiSkill-HAB: {A} Benchmark for Low-Level Manipulation in Home Rearrangement Tasks},

booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025},

publisher = {OpenReview.net},

year = {2025},

url = {https://openreview.net/forum?id=6bKEWevgSd},

timestamp = {Thu, 15 May 2025 17:19:05 +0200},

biburl = {https://dblp.org/rec/conf/iclr/ShuklaTS25.bib},

bibsource = {dblp computer science bibliography, https://dblp.org}

}

Page updated

Google Sites

Report abuse