Often, in cornhole competitions, humans have to manually go and take beanbags from reloading stations. This can be cumbersome, hence I propose to have an autonomous robot that can do this. Onboard the robot, the plan is to mount a camera. We then use computer vision techniques to detect the location of the person we have to throw to in the frame, estimate how far it is, and throw the bean bag to them. The question is whether traditional computer vision techniques are fast and precise enough to enable such an operation.
Some work in this areas exist like [1], although they use an external camera system and use a humanoid robot which have difficulties with locomotion. Our robot can move around freely. In [2], they propose methods by which we can efficiently signal to robots so they can detect it. This work is very important in this application, as we need the robot to detect the gesture. [3] discusses some of the challenges of robot operating in humans environment.
The main gap in the work is less amount of research in a mobile robot framework, which involves the beanbag game and looking for a signal from the human constantly. The integration of LIDAR into the picture to make sure the trajectory is clear of other obstacles is also a scope for research.
I propose to solve this problem by having machine learning techniques to constantly look for a human in the video rame. Once this has been detected, the robot can also look at the throwing region for presence of any other humans. This makes sure nobody will be hit by the beanbag.
Then, the robot signals the throw so the human is ready, and the throw is made. This is the human-robot interaction component.
The experiment that was performed is that I wrote some code to read image data from a mobile camera app, called IPCamera into Python3. I used this streaming video and wrote frames to my laptop. Using this, I figured out how to use OpenCV library’s face detection module to detect human face in the video. The OpenCV Haar cascades module worked relatively well compared to what was expected. However, there is still scope for improvement, as there was lot of false positives as times, which was expected.
The main evaluation criteria is precision of face detection. Out of the ~200 or so frames that I captured in the video, around ~190 of these had precise detections, giving ~95 percent precision. As for this being useful outside of research, it shows that if we want very precise detections, we might need more accurate detectors like deep neural networks. Another criteria of evaluation is the speed of the face detector. The detector was running relatively fast at around 10 frames per second. This is fast enough for real time applications.
Figure 1 shows an example of an accurate face detection. Figure 2 is a case where there is a false negative.
Compared to what I expected, the precision is much higher. It was able to handle relatively fluctuating brightness. I was planning to use pre-trained face detectors using deep neural networks, but these were not needed. During the competition, we did not use this detector, but the phone stream was used by the teloperator to detect the lines on the floor.
In Conclusion, traditional face detectors using haar cascade classifiers are good enough for robust and real time face detection. We integrated this with the cornhole robot using the IPCamera app mounted on the phone. Python code was written to read from the IPCamera stream and process it using the OpenCV library sucesfully.
As far as impact outside of this competition, I think it shows that it is not necessary to always start with complex, heavy systems like deep neural networks. It is always good to start simple with traditional computer vision techniques for the task, and move to more complicated things if performance is not satisfactory.
[1] Kober, Jens, Matthew Glisson, and Michael Mistry. "Playing catch and juggling with a humanoid robot." 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012). IEEE, 2012.
[2] Riek, Laurel D., et al. "Cooperative gestures: Effective signaling for humanoid robots." Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction. IEEE Press, 2010.
[3] Kemp, C. C., Edsinger, A., & Torres-Jara, E. (2007). Challenges for robot manipulation in human environments [grand challenges of robotics]. IEEE Robotics & Automation Magazine, 14(1), 20-29.
Figure 1
Figure 2