WHOLE-MoMa: Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers
WHOLE-MoMa: Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers
WHOLE-MoMa policy on a real Tiago++ mobile manipulator simultaneously opening a cupboard and placing an object inside it
Abstract:
Mobile Manipulation (MoMa) of articulated objects, such as opening doors, drawers, and cupboards, demands simultaneous, whole-body coordination between a robot's base and arms. Classical whole-body controllers (WBCs) can solve such problems via hierarchical optimization, but require extensive hand-tuned optimization and remain brittle. Learning-based methods, on the other hand, show strong generalization capabilities but typically rely on expensive whole-body teleoperation data or heavy reward engineering. We observe that even a sub-optimal WBC is a powerful structural prior: it can be used to collect data in a constrained, task-relevant region of the state-action space, and its behavior can still be improved upon using offline reinforcement learning. Building on this, we propose WHOLE-MoMa, a two-stage pipeline that first generates diverse demonstrations by randomizing a lightweight WBC, and then applies offline RL to identify and stitch together improved behaviors via a reward signal. To support the expressive action-chunked diffusion policies needed for complex coordination tasks, we extend offline implicit Q-learning with Q-chunking for chunk-level critic evaluation and advantage-weighted policy extraction. On three tasks of increasing difficulty using a TIAGo++ mobile manipulator in simulation, WHOLE-MoMa significantly outperforms WBC, behavior cloning, and several offline RL baselines. Policies transfer directly to the real robot without finetuning, achieving 80% success in bimanual drawer manipulation and 68% in simultaneous cupboard opening and object placement, all without any teleoperated or real-world training data.
Motivation
Whole-Body Controllers (WBC) can help perform complex multi-objective problems such as whole-body mobile manipulation tasks that require simultaneous movement of base and multiple arms. However, WBC-based policies are still sub-optimal without perfect tuning and can lead to poor generalization.
In this work we use WBC as a prior to generate data for mobile manipulation tasks and then use Offline RL to improve upon these demonstrations by stitching together good behaviors
WHOLE-MoMa
We first generate diverse whole-body demonstrations using a WBC with randomized data sampling parameters
We then label the data with a reward function and train a Q-chunked IQL critic followed by policy extraction using AWR with a Diffusion Policy
The resulting policy is transferred directly to the real robot using a marker-based or 6D pose-tracking state estimation pipeline
Video demonstration:
BibTeX:
@misc{jauhri2026wholebodymobilemanipulationusing,
title={Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers},
author={Snehal Jauhri and Vignesh Prasad and Georgia Chalvatzaki},
year={2026},
eprint={2604.12509},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2604.12509},
}