Group 7 - Winter 2026
For this project, our seven-person team was tasked with programming a mobile, collaborative robot to carry out assistive tasks. In order to complete the three tasks, we integrated vision models, speech recognition, mapping/navigation, and grasp control algorithms.
Tidybot2 mobile robot base
2 Bimanual WX250s 6-DoF arms
2 Robotiq 2F-85 adaptive parallel jaws
Intel RealSense D435 depth camera
Code written in Python & Ros
Simulation environment in RViz & Mujoco
Vision using YOLOv8, AprilTags, and Face recognition
Speech recognition using Google Cloud YTT & Gemini
Grasp positioning using GraspNet
Given a verbal command, the robot must scan for, navigate to, and pick up the target object.
"Get the banana."
Given a series of verbal commands, the robot must perform a set of actions in sequence.
"Get the banana and place it in the bin."
After retrieving an object, the robot must find the specified person via facial recognition and bring it back to them. After bringing it to the person, the robot accepts a sorting task.
“Get the banana and give it to Michaël.”
“Now, drop it in the banana bin.”
Alexander Chen
Mechanical Engineering (MS)
Frances Raphael
Electrical Engineering (BS/MS)
Ananya Sridhar
Mechanical Engineering (BS/MS)
Michaël Dooley
Mechanical Engineering (MS)
Baihan Zhang
Computer Science (MS)
Sophia Huang
Electrical Engineering (MS)
Elisabeth Holm
Computer Science (BS/MS)
Alexander Chen
2D depth based occupancy map generation
Navigation states/plannning + basic outline for frontier exploration.
Assisted with non-occupancy map navigation.
Ananya Sridhar
Created basic navigation system (no occupancy map, simple obstacle avoidance)
Also created a version with no obstacle avoidance for initial task 1 trails
Testing of Sophia's integration pipeline (YOLO --> exploration)
Initial code for AprilTag detection
Baihan Zhang
Implemented Confirmation pipeline for post-grasp and post-bin-dropping with YOLO integration
Created baseline navigation pipeline for task 1 fallback, with no obstacle avoidance and simple navigation logic (direct navigation without A* and frontier exploration), integrated full task 1 pipeline with voice.
Elisabeth Holm
Led navigation and integration with voice and yolo
Organized team on assignments (eg presentation, website) and timeline tracking
Frontier exploration logic
Map refinement (mapping and correcting the map as the robot drives)
Robust navigation of the robot through obstacle-filled space (including obstacle avoidance through inflation)
Integration into pipeline voice command → explore until object is found via YOLO → grasp. This was the basis of our final version, which replaced the map-based navigation with vision-based navigation
Help with graspnet and attempted GDP (grasp pose detection model)
Frances Raphael
Implemented and helped integrate the voice command module, allowing users to control the robot using spoken instructions.
Developed and tested the drop-bin behavior, allowing the robot to place a grasped object into a bin.
Helped integrate the voice command pipeline with the main control system for sequential task execution.
Sophia Huang
YOLOv8 set up and testing
Integration of pipeline YOLO → exploration
Bin detection alternative plan using color masking (back up plan, not used in final)
Exploration alternative method (back up plan, not used in final)
Michaël Dooley
Full system integrator (brought all of the parts together)
Revising parts for functionality
Sim tester (getting sim working, providing laptop for sim work to teammates)
System/lab tester (in charge of deploying + testing code on real robot)
Was at all but one lab section for testing + implementation on the real robot
Successful GraspNet codebase implementation + successful GPD integration (not used on real robot)
Built final grasping approach (min-width sweep + coordinate conversion)
Facial recognition stack/implementation
Codebase restructuring + refactoring
Properly scoped the tasks, sub-tasks, and tech stack which allowed us to be the first group to finish without a late-night/early morning session