This project follows a structured pipeline of perception → navigation → manipulation. This flow enables efficient object detection, precise movement, and accurate grasping.
The process begins with the speech node, which interprets the user’s command and converts it into a JSON file. The vision node then processes this JSON file, searches for the target object in the camera feed, and publishes its coordinates for the further use.
The navigation node receives the target object's position from the perception system and calculates an approach trajectory. It moves the robot’s base to a position slightly offset from the target to allow for optimal arm movement and manipulation.
The manipulation node refines the object's position using additional perception data to ensure accuracy. Then, it moves the robotic arm directly above the object, carefully lowers it, and securely grasps the object.