Milestone 1: Implement Joint Detection
Our first goal for this project was to implement joint detection. We decided to use a pre-built open-source pose estimation model rather than building our own. The first one we tried to implement, OpenPose, initially seemed like it would work but ended up not having enough support for newer versions of Ubuntu 22.04. We pivoted to using Ultralytics's YOLOv8 model, which worked with our computational setup. YOLO's pose models are pretrained on the COCO dataset, while classification models are pretrained on the ImageNet dataset.
YOLO stands for 'You Only Look Once.' This popular algorithm works by dividing the image into a grid, where each cell, in turn, creates multiple bounding boxes. It then makes predictions for how likely it is that there's an object in each bounding box and what the object is. It performs some cleaning with the highest confidence ones to ensure it doesn't double-count or choose objects with low confidence scores.
Once we acquired the model, finding comprehensive documentation proved to be a bit challenging. Initially, it was relatively easy to get the model to recognize keypoints, and we didn't have to make any modifications to the model itself. However, as we progressed, it became more difficult to find sources that explained the data structures we would be receiving, how to work with them, or what the information they contained actually represented.
After gaining comfort with extracting the desired data and we decided to convert the output of nested tensors to numpy arrays. Numpy arrays were easier to work and easily allowed us to select keypoints based on our number of Neatos. Currently, the selection of meaningful keypoints for a particular video has to be done manually. However, we anticipate that if we were to continue, it would be reasonable to develop a simple algorithm to select keypoints or determine the average position between keypoints that are most important.Â
Current limitations include the inability to handle multiple people in the frame (or when it thinks there are multiple people). Going forward, addressing this issue would be a relatively simple fix, requiring slightly more data processing. Additionally, there is always room for improvement in representing movement in a meaningful and artistic way.
Milestone 2: Control Multiple Neatos
Once joint detection was working, we progressed to controlling the Neatos and establishing the system architecture. Having experience only with controlling one robot at a time, our second milestone involved setting up and controlling multiple Neatos simultaneously. We installed a package that allowed us to connect to multiple Neatos and modified teleop code we had previously written to synchronously control multiple Neatos. This turned out to be a fairly straightforward implementation. However, we did find the process of setting up multiple Neatos to be rather tedious, as it was often challenging to find five fully charged and working Neatos.
Once we knew how to control multiple Neatos, we set up the code architecture to move the Neatos based on the keypoints from joint detection. We created two separate nodes, one for joint detection and one for controlling the Neatos. In the joint detection node, we formatted the keypoints into an Int32MultiArray data type and set up a publisher. In the Neato movement node, we received the keypoints and set up a process_keypoint function to run every time a new keypoint was received.
For this implementation, we opted for the simplest approach to control each Neato. We stored the location of the previous keypoint, compared it to the location of the current keypoint, and calculated the angle and distance the Neato would need to travel to match the change in location. Subsequently, we created five different publishers to send these movement commands to the Neatos. While all five Neatos were able to move simultaneously, their movement didn't precisely match that of the video.
After a somewhat tedious debugging process, we realized that this implementation didn't consider the current orientation of the Neato. Consequently, when it turned, it didn't have a significant effect and would travel in the wrong direction. For our next milestone, we aimed to implement odometry to keep track of the Neato's orientation and enhance the algorithm's robustness. Additionally, as our code was becoming repetitive with five Neatos, we sought to refactor it for greater reusability.
Milestone 3: Implementing Odometry
To get the pose location of each Neato, we set up a five subscribers in the Neato movement node to pull the Neato's pose. The algorithm to calculate the Neato movement would have a fairly similar approach as the one solely based off the keypoints. The algorithm would get the current Neato location which includes orientation which would be translate to a global map, then it would receive the target location which had also been translated to a location on the global map. From there it would figure out the how the robot would need to move.
We implemented a dictionary-of-dictionaries storage solution to keep track of each Neato's information. This approach made adding a subscriber for each Neato location much more manageable and allowed our code to be more reusable and readable. It also streamlined the debugging process. Initially, we attempted to create a Neato class, but we encountered bugs and suspected that the format somehow upset ROS. Since the dictionary solution worked smoothly, we decided to stick with it.
While using odometry worked better than before, it still presented challenges, as the odometry would quickly become inaccurate. Particularly when testing on carpet, the wheels would slip, rendering the odometry close to useless after about 10 seconds, which we failed to foresee. Additionally, there was a delay from human movement to the Neatos, as it's much easier to move a hand a short distance than a Neato a greater distance.
The End and the Path Forward
After a fair amount of debugging, we attempted having the Neato only move in one axis. This approach worked quite well, as the Neato was able to match the location of the hand fairly accurately. This simplicity ended up being more visually appealing and interesting to control. When using several Neatos at once, it provided a fairly good representation of which parts of the body were moving.
In the future, to implement a better version of the two-dimensional movement, we would likely need a more robust method than relying solely on odometry to determine the position and heading of the robot. Possible alternatives could include using April tags or a lidar scanner in a known environment. However, we realized that Neatos, primarily designed for one-dimensional movement with turning capabilities, will never match the ease that joints have in moving in two dimensions. For a better visualization, it would be worth exploring a more versatile type of robot with greater movement control.