Ce Hao*, Xuanran Zhai*, Yaohua Liu, Harold Soh
School of Computing, National University of Singapore; Smart Systems Institute, NUS
Guangdong Institute of Intelligence Science and Technology; Beijing Zhongguancun Academy
Abstract
Diffusion-based policies have recently shown strong results in robot manipulation, but their extension to multi-task scenarios is hindered by the high cost of scaling model size and demonstrations. We introduce Skill Mixture-of-Experts Policy (SMP), a diffusion-based mixture-of-experts policy that learns a compact orthogonal skill basis and uses sticky routing to compose actions from a small, task-relevant subset of experts at each step. A variational training objective supports this design, and adaptive expert activation at inference yields fast sampling without oversized backbones. We validate SMP in simulation and on a real dual-arm platform with multi-task learning and transfer learning tasks, where SMP achieves higher success rates and markedly lower inference cost than large diffusion baselines. These results indicate a practical path toward scalable, transferable multi-task manipulation: learn reusable skills once, activate only what is needed, and adapt quickly when tasks change.
Overview of SMP. Top: Bimanual rollout of “put card in drawer” with key steps (1)–(5). Middle: Skill decomposition by arm and phase: the state-adaptive orthonormal skill basis and sticky routing yields spatial specialization (left/right), and organizes behavior into pick, adjust, reach, and release with corresponding end-effector (EE) actions. Bottom: Gate values over time show sparse, phase-consistent activation—only a few experts are active per step for efficient sampling.
Method
Skill Mixture-of-Experts Policy (SMP) Training Framework. Left (a): During training, raw observations are encoded into state features, which generate an unconstrained matrix W(s). A QR retraction produces a state-adaptive orthogonal basis B(s). Actions are reconstructed via B(s)(g⊙ z), where g are sticky-gated weights and z are diffusion-based coefficients. The model is trained with reconstruction, diffusion, gate regularization, and alignment losses. Right (b): Illustration of the state-adaptive basis across timesteps: as the robot moves, the basis vectors adjust with the state, while sticky gates preserve consistent expert roles (e.g., translation and rotation).
Experiments in Simulation
Multi-task learning in RoboTwin-2 and RLBench-2. SMP partitions bimanual control into an orthonormal skill basis and routes with sticky gates. Across tasks, the same experts are reused for left- and right-arm primitives and for pick–move–place phases, with few switches and long segments. Gate traces reveal sparse, phase-consistent activation, and cross-task skill reuse, indicating that actions are composed from a small, task-relevant subset of experts.
Real Robot Expeirments
Real-robot experiments with four manipulation policies. Left: SMP executes 4 bimanual manipulation tasks. Right: Progress score of each task averaged in 10 trials.