Collaborative Robotics Final Group Project:
Collaborative Robotics | Stanford Graduate Mechanical Engineering
Collaborative Robotics Final Group Project:
Our team successfully completed Task 1 by having the robot interpret the command to "retrieve a banana", navigate to the location of the banana object, pick it up, and return to its original position with the banana in hand.
Object detection was a particularly challenging aspect of this project, with many design iterations required before achieving the final working modules. The biggest issue was that the initially trialed Gemini VLM was not very robust, in that it would sometimes identify a banana as a person or not identify anything at all. Our team switched to the superior YOLO v11 model, using the large pre-trained model with a query for "banana". The success rate was much higher and detection was also much faster with the new model. Furthermore, it required no training and worked easily out of the box, and could also detect objects up to 1.5m away.
Base navigation presented numerous other challenges. Initially a PI controller was tested, however it led to large overshoot and oscillation with robot movement. After considerable experimentation, a P controller was deemed sufficient enough for controlling the base towards an object. A further challenge encountered was from the non-holonomic kinematics not taking into account the correct orientation of the robot. This was resolved by adding another separate orientation P controller to control the angular velocity after the robot reached the desired position.
There were further challenges with arm control, especially when attempting to execute the arm control sequence. Firstly, sleep timers had to be built in to slow down the execution of each step of the sequence, ensuring no step is skipped. Secondly, the arm would often block the camera's view of the object. Our team set up a flag to ignore further messages once a target pose was received.
Test object detection for other objects (Task 2).
Add different masks/filters (e.g. color filter) for far-away objects.
Adapt gripper to grip deformable objects for Task 3.