To see our team's full source code, please see the GitHub repository linked below.
Please visit the main branch and navigate to /ros2_ws/src/tidybot_bringup/scripts to see our Manipulation, Navigation, vision-manipulation, vision-manipulation, and vision-segmentation folders.
In addition, please see our individual .py files, such as brain_node_task1.py, brain_node_task2.py, navigation_node.py, speech_node.py, audio_item_detector.py, and trajectory_tracking.py, to review our ROS2 node setups and code.
The TidyBot2 software stack runs on ROS 2 Humble (Ubuntu 22.04) with MuJoCo as the simulation backend. The system follows an architecture centered on a Brain Node that acts as the orchestrator, communicating with specialized subsystem nodes through published and subscribed topics. No node calls another node's functions directly, as all coordination happens through ROS 2 pub/sub.
Brain Node (brain_node_task1.py, brain_node_task2.py)
is the central state machine orchestrator. It publishes high-level goals to the speech, navigation, and manipulation nodes, and waits for their status responses before advancing states. It does not directly control any hardware. On startup it publishes /arm_status and /task_status to configure downstream nodes, and monitors /joint_states to confirm the simulation is running before proceeding.
Navigator Node (navigator.py)
handles all base movement and runs an internal state machine (SCANNING → APPROACHING → FINE_CENTERING → CAMERA_RESET → FINAL_APPROACH → RETURNING). On receiving an item name on /brain/navigation_goal, it forwards the name to the Vision Node via /vision/target, then uses proportional visual servoing on pixel error from /object_detection to keep the object centered while driving forward. Return-to-start is handled by publishing a Pose2D to /base/target_pose and waiting for /base/goal_reached.
Vision Node (vision_yolo_gemini.py)
processes the RGB camera stream from /camera/color/image_raw. It runs a YOLOv26 detector filtered to the target object specified on /vision/target, publishing the detected object's pixel centroid to /object_detection (used by the Navigator for visual servoing) and the 2D bounding box to /vision/target_bbox (used by the Grasp Planner).
Point Cloud Node (point_cloud_node.py)
Converts the depth image from /camera/depth/image_raw into a full-scene PointCloud2 on /camera/points using the camera intrinsics from /camera/depth/camera_info. It activates on "final_approach" from /brain/navigation_status and deactivates on "done" from /brain/manipulation_status.
Grasp Planner Node (simple_grasp_planner_node.py)
computes grasp poses when triggered by a bounding box on /vision/target_bbox. It crops the full point cloud from /camera/points using the bbox pixel coordinates, segments foreground points via a Z-height filter, computes the object centroid and principal axis via PCA, and publishes a PoseStamped grasp pose to /detected_grasps/pose.
Manipulation Node (manipulation_node.py)
executes the physical grasp sequence upon receiving a command on /brain/manipulation_goal. It runs a 50 Hz control loop through states: MOVE_PREGRASP → MOVE_GRASP → PAUSE_AT_GRASP → CLOSE_GRIPPER → MOVE_LIFT, using joint state feedback for arrival detection and gripper stall detection to confirm a successful grasp. For Task 3 it also executes a TWIST state (180° wrist rotation for cap removal). It publishes completion status to /brain/manipulation_status.
MuJoCo Bridge (mujoco_bridge_node.py)
wraps the MuJoCo physics simulation and exposes it as a standard ROS 2 interface. It publishes /joint_states, /odom, /camera/color/image_raw, /camera/depth/image_raw, and TF transforms, while subscribing to /cmd_vel, /right_arm/joint_cmd, /left_arm/joint_cmd, /right_gripper/cmd, /left_gripper/cmd, /camera/pan_tilt_cmd, and /base/target_pose. It implements a position controller internally for go-to-goal navigation and publishes /base/goal_reached on arrival.
Hardware Bridge (tidybot_control package)
replaces the MuJoCo Bridge on the physical robot with a set of driver nodes that together expose the same ROS 2 topic interface, allowing higher level nodes to run the same on both sim and real hardware. phoenix6_base_node.py subscribes to /cmd_vel and /base/target_pose to drive the holonomic base, publishing /odom and /base/goal_reached just as the sim does. arm_wrapper_node.py and gripper_wrapper_node.py translate the /right_arm/joint_cmd, /left_arm/joint_cmd, /right_gripper/cmd, and /left_gripper/cmd topics into the low-level Interbotix SDK commands required by the real Dynamixel motors. pan_tilt_node.py handles camera positioning via /camera/pan_tilt_cmd, and microphone_node.py exposes the robot's microphone as the /microphone/record service used by the Speech Node.
Speech Node (speech_node.py)
manages the full voice command pipeline. On receiving a goal on /brain/speech_goal, it calls the /microphone/record service to capture audio, then passes the recording through a two-stage pipeline: Google Cloud Speech-to-Text for transcription, followed by a Gemini LLM call to extract the target item name (or a JSON payload/destination pair for Task 2). The result is published to /brain/speech_result.