ConceptBot
Enhancing the Autonomy of Robotic Systems through Task Decomposition with Large Language Models and Knowledge Graphs
Enhancing the Autonomy of Robotic Systems through Task Decomposition with Large Language Models and Knowledge Graphs
Robotic systems have advanced significantly, yet effective planning in unstructured environments remains a substantial challenge. This paper introduces ConceptBot, a novel planning system that integrates Large Language Models (LLMs) with Knowledge Graphs (KGs), particularly ConceptNet, to enhance task decomposition and context understanding. Unlike traditional methods reliant on pre-programmed models or extensive training on specialized datasets, ConceptBot features a modular architecture with three components: Object Properties Extraction (OPE), User Request Processing (URP), and Planner. OPE identifies and extracts object properties relevant to the task, leveraging KGs to provide contextual relationships for improved scene understanding. URP interprets natural language instructions, using KGs to better understand the user's needs and determine how to satisfy them effectively. The Planner synthesizes this information to generate executable pick-and-place policies that are contextually appropriate and feasible.
To evaluate its effectiveness, ConceptBot was compared with Google's SayCan, demonstrating superior performance in handling ambiguous instructions and generating feasible policies. Experiments were conducted in a simulative environment using PyBullet, ViLD, and CLIPort, as well as in the IDSIA laboratory with the Franka Emika Panda robotic arm, Intel RealSense camera, and YOLO-based object detection. Results highlight ConceptBot’s exceptional adaptability, accurate user request interpretation, and robust handling of ambiguous objects, achieving high performance in pick-and-place operations without additional specialized training.
The OPE module extracts relevant attributes from objects in the environment using ViLD or YOLO for detection and ConceptNet for contextual relationships. It identifies properties such as fragility, stability, and dangerous to enhance planning.
The URP module interprets natural language instructions, extracting keywords and querying ConceptNet for semantic relationships. This process eliminates ambiguities and generates structured, actionable commands aligned with the robot’s capabilities.
The Planner integrates information from OPE and URP to produce pick-and-place policies. By combining LLM scoring with affordance evaluations, it ensures safe, feasible, and contextually relevant actions.
Pick_and_Place(Med_Block, Big_Block)
Pick_and_Place(Small_Block, Med_Block)
Pick_and_Place(Med_Block, Big_Block)
Pick_and_Place(Small_Block, Med_Block)
Pick_and_Place(Sphere, Block)
Pick_and_Place(Block, Sphere)
'Please, I want to make hollandaise sauce.'
Pick_and_Place(Egg, Bowl)
Pick_and_Place(Lemon, Bowl)
Pick_and_Place(Butter, Bowl)
'I got the plate dirty!'
Pick_and_Place(Sponge, User)
Pick_and_Place(Banana, Bowl)
Pick_and_Place(Carrot, Bowl)
Pick_and_Place(Chilli_Bottle, Front_Bowl)
....
Pick_and_Place(Chili_Bottle, Bowl)