LEAGUE++: EMPOWERING CONTINUAL ROBOT LEARNING THROUG GUIDED SKILL ACQUISITION WITH LARGE LANGUAGE MODELS

Anonymous authors 

Abstract

To support daily human tasks, robots need to tackle intricate, long-term tasks and continuously acquire new skills to handle new problems. Deep reinforcement learning (DRL) offers potential for learning fine-grained skills but relies heavily on human-defined rewards and faces challenges with long-horizon tasks. Task and Motion Planning (TAMP) are adept at handling long-horizon tasks but often need tailored domain-specific skills, resulting in practical limitations and inefficiencies. To address these challenges, we developed LEAGUE++, a framework that lever- ages Large Language Models (LLMs) to harmoniously integrate TAMP and DRL for continuous skill learning in long-horizon tasks. Our framework achieves auto- matic task decomposition, operator creation, and dense reward generation for ef- ficiently acquiring the desired skills. To facilitate new skill learning, LEAGUE++ maintains a symbolic skill library and utilizes the existing model from semantic- related skill to warm start the training. Our method, LEAGUE++, demonstrates superior performance compared to baselines across four challenging simulated task domains. Furthermore, we demonstrate the ability to reuse learned skills to expedite learning in new task domains.

Quantitative Results

We visualize the key stages of the three evaluation tasks, which are StackAtTarget, PegInHole, and StowHammer.  Tasks for each domain are shown below. We compare relevant methods on the three task domains. The plot shows the corresponding task completion progress (0 for initial state, 1 for task completion) throughout training. 

Demos

无标题视频——使用Clipchamp制作 (12).mp4

Demo for the task in StackAtTarget domain.

无标题视频——使用Clipchamp制作 (10).mp4

Demo for the task in PegInHole domain.

无标题视频——使用Clipchamp制作 (9).mp4

Demo for the task in StowHammer domain.

Reusing Skills from Symbolic Skill Library in new domain

We visualize the key stages of the ServeCoffee domain. It emphasizes that reusing learned skills can enhance the training efficiency of new skills.

Ablation Study

无标题视频——使用Clipchamp制作 (11).mp4

Demo for the task in ServeCoffee domain.

This table shares the success rates for correct executions of generated rewards in the task StowHammer and its corresponding skills. This table compares our proposal LEAGUE++ with the corresponding ablation LEAGUE++ without Metrics Inspector (w/o MI) .

This figure shares the error rates/success rates for reward generation with the task StowHammer. This figure compares our proposal LEAGUE++ with the corresponding ablation LEAGUE++ without Metrics Inspector (w/o MI).