M3 for Rearrangement

Multi-skill Mobile Manipulation for Object Rearrangement

Jiayuan Gu, Devendra Singh Chaplot, Hao Su, Jitendra Malik

UC San Diego, Meta AI Research, UC Berkeley

Abstract

We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks. Although more effective than monolithic end-to-end RL policies, this framework suffers from compounding errors in skill chaining, e.g., navigating to a bad location where a stationary manipulation skill can not reach its target to manipulate. To this end, we propose that the manipulation skills should include mobility to have flexibility in interacting with the target object from multiple locations, and at the same time the navigation skill could have multiple end points which lead to successful manipulation. We operationalize these ideas by implementing mobile manipulation skills rather than stationary ones and training a navigation skill trained with region goal instead of point goal. We evaluate our multi-skill mobile manipulation method M3 on 3 challenging long-horizon mobile manipulation tasks in the Home Assistant Benchmark (HAB), and show superior performance as compared to the baselines.

Video

The video summarizes our multi-skill mobile manipulation approach (M3) for object rearrangement.

video_v2.mp4

Qualitative Results

We analyze different cases to show the advantage of our mobile manipulation skills and region-goal navigation reward.

S+P (baseline 1): stationary manipulation skills + point-goal navigation reward
M+P (baseline 2): mobile manipulation skills + point-goal navigation reward
M3 (our method): mobile manipulation skills + region-goal navigation reward

For each video

The largest image is captured by a third-view camera mounted on the robot. It is for visualization only. The target object is highlighted by a white bounding box. The goal position is highlighted by a visual-only box. The closest navigable positions to the starting and goal positions of the target object are visualized as green arrows pointing towards target positions.
The green border indicates that a correct object is picked. The red border indicates that the wrong object is picked. The yellow border indicates that the gripper releases anything held.
The RGB and depth images from head and arm cameras are visualized on the right side.

TidyHouse

Move 5 objects from starting positions to goal positions.

th_sp_ep6_pick_wo.mp4

S+P: pick a wrong object (1st) since it starts too close to the counter.

th_mp_ep6_pick_co.mp4

M+P: pick a correct object (1st) by adjusting its base position.

th_sp_ep97_obstacle.mp4

S+P: The arm is blocked by the tv when it tries to pick the last object.

th_mp_ep97_obstacle.mp4

M+P: The arm is also blocked, but the robot is able to avoid being blocked due to mobility.

th_mr_ep_97_obstacle.mp4

M3 (ours): The navigation skill learns to terminate a more suitable position for manipulation.

PrepareGroceries

Move 2 objects from the fridge to the counters and move an object from the counter to the fridge.

pg_sp_ep53_disturb.mp4

S+P: The robot accidentally closes the fridge.

pg_mr_ep53_disturb.mp4

M3 (ours): The robot is able to avoid disturbing the environment.

SetTable

Move a bowl from a drawer to a table and move an apple from the fridge to the table.

st_sp_ep26_fridge.mp4

S+P: The navigation skill terminates at the position where the robot can not reach the target object in the fridge if the base is fixed.

st_mr_ep26_fridge.mp4

M3 (ours): The robot can move closer to the object and then pick it, to compensate for the navigation skill.