ConceptBot
Enhancing the Autonomy of Robotic Systems through Task Decomposition with Large Language Models and Knowledge Graphs
Enhancing the Autonomy of Robotic Systems through Task Decomposition with Large Language Models and Knowledge Graphs
Robotic planning breaks down when commonsense reasoning is required to resolve linguistic ambiguity and to interpret objects correctly. To address this, we present ConceptBot, a modular planning framework that integrates large language models with knowledge graphs to produce feasible, risk-aware plans while jointly disambiguating instructions and grounding object semantics.
ConceptBot comprises three components: (i) an Object Properties Extraction (OPE) module that augments scene understanding with semantic concepts from ConceptNet; (ii) a User Request Processing (URP) module that resolves ambiguities and structures free-form instructions; and (iii) a Planner that synthesizes context-aware, feasible pick-and-place policies. Evaluations in simulation and on real-world setups show consistent gains over prior LLM-based planners—for example, +56 percentage points on implicit tasks (87\% vs. 31\% for SayCan) and +61 points on risk-aware tasks (76\% vs. 15\%)—and an overall score of 80\% on SafeAgentBench. These improvements translate to more reliable performance in unstructured environments without domain-specific training.
The OPE module extracts relevant attributes from objects in the environment using ViLD or YOLO for detection and ConceptNet for contextual relationships. It identifies properties such as fragility, stability, and dangerous to enhance planning.
The URP module interprets natural language instructions, extracting keywords and querying ConceptNet for semantic relationships. This process eliminates ambiguities and generates structured, actionable commands aligned with the robot’s capabilities.
The Planner integrates information from OPE and URP to produce pick-and-place policies. By combining LLM scoring with affordance evaluations, it ensures safe, feasible, and contextually relevant actions.
Pick_and_Place(Med_Block, Big_Block)
Pick_and_Place(Small_Block, Med_Block)
Pick_and_Place(Med_Block, Big_Block)
Pick_and_Place(Small_Block, Med_Block)
Pick_and_Place(Sphere, Block)
Pick_and_Place(Block, Sphere)
'Please, I want to make hollandaise sauce.'
Pick_and_Place(Egg, Bowl)
Pick_and_Place(Lemon, Bowl)
Pick_and_Place(Butter, Bowl)
'I got the plate dirty!'
Pick_and_Place(Sponge, User)
Pick_and_Place(Banana, Bowl)
Pick_and_Place(Carrot, Bowl)
Pick_and_Place(Chilli_Bottle, Front_Bowl)
....
Pick_and_Place(Chili_Bottle, Bowl)