LG-SAIL

Continual Robot Learning via Language-Guided Skill Acquisition

ral_lgsail_presentation.mp4

To support daily human tasks, robots need to tackle complex, long-horizon tasks and continuously acquire new skills to handle new problems. Deep Reinforcement Learning (DRL) offers potential for learning fine-grained skills but relies heavily on human-defined rewards and faces challenges with long-horizon goals. Task and Motion Planning (TAMP) are adept at handling long-horizon tasks but often need tailored domain-specific skills, resulting in practical limitations and inefficiencies. To address these challenges, we propose LG-SAIL (Language Models Guided Sequential, Adaptive, and Incremental Skill Learning), a framework that leverages Large Language Models (LLMs) to synergistically integrate TAMP and DRL for continuous skill learning in long-horizon tasks. Our framework achieves automatic task decomposition, operator creation, and dense reward generation for efficiently acquiring the desired skills. To facilitate new skill learning, our framework maintains a symbolic skill library and utilizes the existing model from semantic-related skills to warm start the training. LG-SAIL demonstrates superior performance compared to baselines across six challenging simulated task domains across two benchmarks. Furthermore, we demonstrate the ability to reuse learned skills to expedite learning in new task domains, and deploy the system on a physical robot platform.

Framework

Our framework automates task decomposition, skill creation, and dense reward generation by creating a virtuous cycle between planning and skill learning. Execution failures reveal skills needing improvement, while a symbolic skill library enables warm-starting new skills using semantically related models.

LEAGUE Benchmark: Learning Long-Horizon Manipulation Tasks

We compare our framework with other baselines across three task domains. The plot illustrates the average task progress during evaluation over the training phase, measured as the sum of rewards for each successfully executed skill in the task plan, normalized to 1. The shaded area represents the standard deviation for 5 random seeds.

无标题视频——使用Clipchamp制作 (12).mp4

Rollout for StackAtTarget domain.

无标题视频——使用Clipchamp制作 (10).mp4

Rollout for PegInHole domain.

无标题视频——使用Clipchamp制作 (9).mp4

Rollout for StowHammer domain.

Adapting to New Domain

For MakeCoffee, we compare learning from scratch and adapting the skills learned from StowHammer domain.

无标题视频——使用Clipchamp制作 (11).mp4

Rollout for MakeCoffee domain.

LIBERO Benchmark: Adapting to New Objects and Scenes

We evaluate generalization on LIBERO-Spatial, where our method quickly adapts learned skills to new scene configurations.

We evaluate on LIBERO-Object to test generalization across object shapes. While initial tasks require more training, subsequent ones are learned faster, demonstrating strong potential for continual learning.

video (9).mp4

video (10).mp4

video (11).mp4

video (12).mp4

video (13).mp4

video (14).mp4

video (15).mp4

video (16).mp4

video (17).mp4

video (18).mp4

task9.mp4

task7.mp4

task2.mp4

task5.mp4

task4.mp4

task6.mp4

task1.mp4

task8.mp4

task3.mp4

task4.mp4

Real World Experiment

video (2).mp4

video (1).mp4

video (3).mp4

video (4).mp4

video (5).mp4

video (6).mp4

Page updated

Google Sites

Report abuse