In order to have bots apply the transforms computed by the main thread, we created multiple shared channels using ROS's multimaster_fkie package. To do this we had to execute a series of shell commands:
We set up multimaster starting with localhost and moving onto each bot in the game. We then set up the realsense camera using the same launch files as Labs 4 and 6, namely rs_camera.launch from the realsense2_camera package and the ar_track.launch file from the lab4_cam package. (We imported both of these packages into our project code)
Leveraging Joel and Jason's classifier work, we created a script that would reliably detect to which bot the stick was pointing. We ran this script as part of our main that we'd run for every bot in the game. It took in the turtlebot_ID or the color of the bot for which this script was running. As output, it gave the role assignment of that bot ("pursuer" or "evader").
In order to run the actual game of tag, we would need to use the role assignment of each bot (generated above) and use it to determine the bot's next move. We split up the logic between "pursuer" bots and "evader" bots.
Once we had a transform assigned for each bot, we needed the bots to physically move. It wasn't enough to have the Twist objects published to the shared multi-master topic -- each bot had to publish the results directly to its corresponding "<color>/mobile_base/commands/velocity" channel in order to actually move. Thus we wrote a separate script called pubsub.py that would handle this specific task of translating commands from one topic to another.
We first convert the image from RBG to HSV, and segment in the HSV space. We initially thought that using color space segmentation alone will be enough to recognize the stick. However, we learned that the color segmentation is quite noisy (see the red circles below).
After realizing that color segmentation alone was not enough we were faced with two options:
Ultimately, we decided to maximize our learning experience by building a more robust system. We did not have classical computer vision experience; however, we were excited to learn and the following is what we came up with.
We then convert the color segmented image to grayscale for further processing. A Gaussian Blur is also applied to be rid of noise and generate a smoother image segment.
A threshold is applied to further decrease the amount of remaining noise. We thresholded based on the intensity of each pixel, because after gaussian blur noise is generally going to show up with smaller intensity. We then dilate the blocks to fill up small gaps and increase the smoothness of the image which is helpful for contour detection.
After removing as much noise as we could, we needed a way to identify which "blob" is the stick. In order to do this, we used OpenCV contour detection algorithms to fit rectangles to the "blobs". Next we built a classifier to determine which box contains the stick.
Our classifier is a function of a boxes aspect ratio and area. We tuned the classifier using a heuristic guess and check process. We initially believed that we should enforce an aspect ratio cutoff because sticks should be rectangular (as opposed to square). However, when the sticks maximum variance is in the cameras Z direction, the 2D color image of the stick looks more square. Upon lots of testing, we found the best approach was to weight the aspect ratio 0 and choose the box with the largest area, however that area must be larger than some threshold. This led to simple algorithm that selects which box is the stick or indicates if the stick is not present.
Once we had the correct rectangle, we were able to use it as a mask to select the portion of the original image that contains the stick.
After experimenting with our segmented image from steps 1 through 5, we realized that the RealSense camera is not perfect. We found that towards the edges of the stick the depth sensor would read the wall behind the stick instead of the edges. This caused downstream errors when we tried to predict the direction the stick was pointing. We also realized that all we needed was the depth cloud from the center of the stick. Therefore we used morphological erosion with a 9x9 kernel to erode much of the stick. Adding the erosion step greatly increased the accuracy of our CV system.
Once we had the segmented and eroded image, we then:
We transformed the turtles (represented by ar_marker_4 and ar_marker_3) from the camera frame to the stick frame.
Once we had the turtles represented in the stick frame, we needed to determine which one the stick was pointing at. Since we wanted to build a robust and scalable system, we needed to account for the case when many turtle bots we close together. We felt the best way to handle this was to make the classifier a likelihood estimator that way we can determine whether it is unsure and then narrow down the selection.
Due to the lack of training examples, and with an emphasis on reducing complexity we decided to develop a heuristic approach for defining the probability density function (p(i)). Our method fit based on cosine(theta), here is the process: