Post #9

Perception Solutions:

Aside from our ideal perception solution outlined in post #8, we also need to consider a target solution and a minimum viable fallback solution.

The simplest feasible perception system that does not require any environmental modification would be to use computer vision to detect both the walker and the patient. In terms of road navigation, we can use LiDAR, RGBD cameras ,and relevant libraries to help the robot to find the best path from the walker to the patient.

Detection of walker: We want to utilize the stretch deep perception library, more specifically object detection that ultilizes the YOLO model. Since YOLO's model is not trained on detecting walker, we will need to apply transfer learning on pre-trained YOLO models with our dataset of labelled walker images.
Grabbing of walker: We can use the Dex-Net model developed by Berkeley Autolab. It generates a 3D point cloud of the objects and assess the robustness of a list of graspable points. We can first try to apply their model straight onto the walker. If it does not work, we need to train the model with dataset of 3D walker object in HDF5 format.
Detection of the patient: As for simplicity, we want to utilize the above YOLO model in patient detection since YOLO is capable of detecting a person. Under the scenario where multiple people maybe present in the same room. Face Recognition models such as FaceNet can be incorporated when stretch has to decide which person it should deliver the walker to.
Path Navigation in the room: We are going to use stretch_rplidar_job.py and stretch_rplidar_mapping.py files under Hello robot's stretch_body repository to generate a grid map of the room as the robot moves. Afterwards, we can apply path finding algorithms such as A* path-finding algorithm to find the shortest path from robot to walker, then to the patient.

Our minimum viable fallback solution is a combination of Human-in-the-loop and the use of ArUco markers to help with object detection.

Detection and grabbing of walker:
- Option 1 (ArUco markers):We would paste ArUco markers on all the surfaces that the walker should be able to grab. This will make the walker and graspable point easier to detect. The robot will need to use several ArUco tags to get itself in the same perfect position to start grabbing the walker. The robot needs to be in an optimal pose so that it can pull the walker without its base colliding with the walker’s wheels. We’ll test and record the position of the robot relative to the position of the walker such that when the robot starts pulling from that position, it won’t cause any collision afterwards. Therefore, the robot should perceive the distance between its base and the walker’s wheels when it’s in its optimal pose. Then, the grabbing action can be hard coded. We can use programming by demonstration to store a series of poses in the grabbing action. It would be feasible because it is performing the same grabbing action from a fixed position so it will always succeed.
- Option 2 (color recognition): Since our purchased walker is red, we can take advantage of this striking color to detect the walker. We can make the robot detect anything that is red in all the processed frames, assuming that nothing else in the environment is red. Using this approach, the robot should be able to detect the walker easily.
Detection of the patient: We can use the Segment Anything by Meta to locate the patient in the image and then use LiDAR and depth cameras to calculate the position of the patient. The human input we require is the patient locating himself/ herself from the robot's camera and click on the image to create a bounding box around the patient.
Path Navigation in the room: We can use the navigation tool demonstrated in class/ use the XBOX remote to move the robot to the walker and the patient. The human input we need is the patient controlling the web interface or the controller.

Human-in-the-loop interaction requires the patient to be close to either the interacting interface/ the controller in order for the solution to be feasible. We can transfer the interface to devices like phone to make it more accessible to the patient.

One more simplification could be to ask the patient to rotate the walker once it is delivered to them, as the rotation of the walker is something we struggled with when tele-operating stretch.