LEAGUE++: EMPOWERING CONTINUAL ROBOT LEARNING THROUG GUIDED SKILL ACQUISITION WITH LARGE LANGUAGE MODELS
Anonymous authors
Abstract
To support daily human tasks, robots need to tackle intricate, long-term tasks and continuously acquire new skills to handle new problems. Deep reinforcement learning (DRL) offers potential for learning fine-grained skills but relies heavily on human-defined rewards and faces challenges with long-horizon tasks. Task and Motion Planning (TAMP) are adept at handling long-horizon tasks but often need tailored domain-specific skills, resulting in practical limitations and inefficiencies. To address these challenges, we developed LEAGUE++, a framework that lever- ages Large Language Models (LLMs) to harmoniously integrate TAMP and DRL for continuous skill learning in long-horizon tasks. Our framework achieves auto- matic task decomposition, operator creation, and dense reward generation for ef- ficiently acquiring the desired skills. To facilitate new skill learning, LEAGUE++ maintains a symbolic skill library and utilizes the existing model from semantic- related skill to warm start the training. Our method, LEAGUE++, demonstrates superior performance compared to baselines across four challenging simulated task domains. Furthermore, we demonstrate the ability to reuse learned skills to expedite learning in new task domains.
Quantitative Results
We visualize the key stages of the three evaluation tasks, which are StackAtTarget, PegInHole, and StowHammer. Tasks for each domain are shown below. We compare relevant methods on the three task domains. The plot shows the corresponding task completion progress (0 for initial state, 1 for task completion) throughout training.
Demos
Demo for the task in StackAtTarget domain.
Demo for the task in PegInHole domain.
Demo for the task in StowHammer domain.
Reusing Skills from Symbolic Skill Library in new domain
We visualize the key stages of the ServeCoffee domain. It emphasizes that reusing learned skills can enhance the training efficiency of new skills.
Ablation Study
Demo for the task in ServeCoffee domain.
This table shares the success rates for correct executions of generated rewards in the task StowHammer and its corresponding skills. This table compares our proposal LEAGUE++ with the corresponding ablation LEAGUE++ without Metrics Inspector (w/o MI) .
This figure shares the error rates/success rates for reward generation with the task StowHammer. This figure compares our proposal LEAGUE++ with the corresponding ablation LEAGUE++ without Metrics Inspector (w/o MI).