Identifying the correct transformation posted in the ROS framework to use for converting from the camera to base frames proved difficult. For troubleshooting, an approximate transformation matrix which was known to produce roughly the desired rotation and translation between these frames was constructed and used to identify issues with the transformation matrix obtained from ROS. This allowed us to debug an issue in translating the quaternions posted by ROS for the appropriate transformation into the correct transformation matrix.
Effective grasping, pouring, and handoffs required thorough tuning of the positions relative to the desired object center to which the arm moved. Notably, the arm's motion planner would also frequently change orientation of the gripper during movements; this was also factored in to the position tuning.
The Google Vision API struggled to detect medicine bottles due to their small size or narrow shape. Orientation proved key.
This task was completed through the following steps:
Receive audio input and determine target and task
Use Gemini to parse audio transcript
Move arm to rest position (clear camera)
Rotate until target object is found
Drive to object until depth to it is under DISTANCE_THRESHOLD (600.0)
Stop, then grasp
Move near object w/ fixed orientation, smoothly approach object while maintaining orientation,
Grasp object, lift up
When grasp procedure succeeds, repeat from Step 3 (rotate)
Target object is now “person” (point of retrieval)
This time, when person is within DISTANCE_THRESHOLD, hand them the object
Approach above the hand, release gripper, move arm back to rest position
This task was completed using the following strategy:
Arm initially moves out of camera view
On start → TURN: command only nonzero z rotation (rotate in place)
Upon obj detection → GO: determine angle between base x and desired obj
If below threshold, command only nonzero x velocity (drive forward)
If above threshold, command only nonzero z rotation (rotate toward object; sign determines direction)
Within threshold → STOP: command zero velocity
Tuning of the following was also performed:
z rotation
x velocity values
This task was completed through the following state responses:
State Responses
WAIT: move arm to home position by publishing to /go_home
GRASP: perform grasp by…
simple object: orient gripper down and move above → lower → grip →raise
medicine bottle: orient gripper down and move above & behind → orient gripper state → move forward to object → grip → raise
DETECT HAND:
simple object: place in hand
If pouring: rotate 90º in x
Tuning of positional offsets following was also performed.