SADiff: Skill-Aware Diffusion Framework for Generalizable Robotic Manipulation
SADiff: Skill-Aware Diffusion Framework for Generalizable Robotic Manipulation
Anonymous Submission
SADiff Framework
Overview of the proposed Skill-Aware Diffusion (SADiff). The pipeline is structured into three distinct phases: (1) The encoding phase, where the skill-aware encoding module utilizes learnable skill tokens to interact with multimodal inputs and extract skill-specific information; (2) The generation phase, in which a skill-constrained diffusion model generates object-centric motion flow conditioned on the skill-aware token sequences, optimized by both denoising and two skill-specific auxiliary losses; and (3) The execution phase, which employs a skill-retrieval transformation strategy to translate the generated 2D motion flow into executable 3D trajectories by leveraging skill-specific priors.
Section A:Skill-Aware Encoding Module
To effectively integrate skill-specific information with multimodal inputs, we designed a skill-aware encoding module. The skill-aware encoding module integrates image, language, and bounding boxes of relevant objects with learnable skill tokens through attention-based interactions, producing skill-aware token sequences.
Section B:Skill-Constrained Flow Generation
To generate a precise 2D object motion flow aligned with a specific skill, we propose a skill-constrained diffusion model. The diffusion model generates motion flow by jointly optimizing skill classification loss, skill contrastive loss, and denoising loss to ensure accurate skill selection, semantic alignment, and precise flow reconstruction.
Section C:Retrieval-Enhanced Transformation
To achieve an accurate transformation from 2D flow to executable 3D actions, we introduce skill-specific trajectory priors into the optimization framework, leveraging them as high-level constraints to guide the optimization toward skill-consistent motion patterns with improved accuracy and physical consistency.
Experiment Demonstration
Simulation Experiments
1. Within-Distribution Experiment
2. Robustness and Generalization Experiment
3. Instruction-Guided Skill Adaptation Experiment
Real-World Experiments
1. Experiment Results
2. Qualitative Experiment
Scalability and Composability Experiments
1. Scalability Experiment
2. Composability Experiment