To support daily human tasks, robots need to tackle complex, long-horizon tasks and continuously acquire new skills to handle new problems. Deep Reinforcement Learning (DRL) offers potential for learning fine-grained skills but relies heavily on human-defined rewards and faces challenges with long-horizon goals. Task and Motion Planning (TAMP) are adept at handling long-horizon tasks but often need tailored domain-specific skills, resulting in practical limitations and inefficiencies. To address these challenges, we propose LG-SAIL (Language Models Guided Sequential, Adaptive, and Incremental Skill Learning), a framework that leverages Large Language Models (LLMs) to synergistically integrate TAMP and DRL for continuous skill learning in long-horizon tasks. Our framework achieves automatic task decomposition, operator creation, and dense reward generation for efficiently acquiring the desired skills. To facilitate new skill learning, our framework maintains a symbolic skill library and utilizes the existing model from semantic-related skills to warm start the training. LG-SAIL demonstrates superior performance compared to baselines across six challenging simulated task domains across two benchmarks. Furthermore, we demonstrate the ability to reuse learned skills to expedite learning in new task domains, and deploy the system on a physical robot platform.
Our framework automates task decomposition, skill creation, and dense reward generation by creating a virtuous cycle between planning and skill learning. Execution failures reveal skills needing improvement, while a symbolic skill library enables warm-starting new skills using semantically related models.
LEAGUE Benchmark: Learning Long-Horizon Manipulation Tasks
We compare our framework with other baselines across three task domains. The plot illustrates the average task progress during evaluation over the training phase, measured as the sum of rewards for each successfully executed skill in the task plan, normalized to 1. The shaded area represents the standard deviation for 5 random seeds.
Rollout for StackAtTarget domain.
Rollout for PegInHole domain.
Rollout for StowHammer domain.
Adapting to New Domain
For MakeCoffee, we compare learning from scratch and adapting the skills learned from StowHammer domain.
Rollout for MakeCoffee domain.
LIBERO Benchmark: Adapting to New Objects and Scenes
We evaluate generalization on LIBERO-Spatial, where our method quickly adapts learned skills to new scene configurations.
We evaluate on LIBERO-Object to test generalization across object shapes. While initial tasks require more training, subsequent ones are learned faster, demonstrating strong potential for continual learning.
Real World Experiment