The criteria that we set for our success were Baxter's ability to:
Localize and recognize cards with the camera
Consistently pick and place cards on a table
Monitor and respond to the current state of the game, making legal moves on its turn
By successfully combining this sequence of steps, Baxter should be capable of playing out a full card game against any number of human players. Consequently, it was also important that we select a fitting game to demonstrate Baxter's capabilities.
While we initially considered poker as our game of choice, we quickly realized that the game would involve very little actual interaction with cards. Thus, we pivoted to another simple card game, Numbers and Suits. In this two-player game, both players draw a set number of cards from the deck and a starting card is flipped to begin the game. Taking turns, each player must play a card that matches either the suit or the rank of the most-recently-played card. If the player does not possess a valid card, they must draw from the deck and end their turn. The game ends when a player has emptied their hand.
This game choice also allowed us to expand Baxter's functionality to detect when an illegal move has been made by the human player. If the card they play on their turn does not match the suit or rank of the last card, Baxter will acknowledge the infraction but continue the game as normal.
The state diagram below illustrates the flow of the game.
In order to detect cards, we decided to use the Baxter robots' in-built hand cameras rather than any external cameras made available to us. While the resolution and image quality of the Logitech webcams was significantly better, Baxter's cameras provided two significant advantages:
Firstly, the transform of the hand camera frame relative to Baxter's base frame is known to Baxter at all times. Utilizing an external camera would require us to calculate additional transforms through the use of AR markers, adding another point of potential inaccuracy.
Secondly, Baxter's path planning modules could be used to orient and position the camera in a consistent manner. A clear top-down view of the cards is ideal for our computer vision to work, and angling an external camera over the playing table and Baxter in this way would be difficult.
Ultimately, the image quality of Baxter's cameras did not appear to have a significant impact on our card detection algorithms.
Due to the thinness of the playing cards, picking them up using Baxter's standard electric grippers was infeasible. We considered manufacturing a sponge-like attachment that the Baxter could "press" into the cards and use to slide them around, but drawing a single card from a stack would remain difficult using this method. Furthermore, "pathing" around other cards and objects placed on the table would prove complicated.
Hence, we settled upon using a vacuum gripper attachment, which used Baxter's in-built pneumatics system to apply suction to cards and release them. The installation of this attachment was significantly more difficult, however, and complications with its setup significantly hindered our progress. Initially, none of the Baxter robots had the complete set of working parts necessary for the grippers to function, and many hours of work were spent troubleshooting these hardware failures.
However, once the vacuum grippers were mechanically functional, they worked extremely effectively. There were no problems with consistency when picking up and releasing cards, and proper suction was maintained in all orientations. As a result, movement and path planning with the vacuum gripper was much more efficient compared to alternatives, as Baxter was allowed to move and reorient its hand in any direction necessary. One unexpected complication, however, was an air leak on the side of the vacuum gripper that would sometimes blow away cards and AR markers placed on the table. This was remediated slightly by adding tape to the AR markers; the issue was cards was rarer, as the cards were made of heavier and thicker material.
While we initially wanted to mount the camera and gripper on opposite hands and have the camera arm maintain a view of the table while the gripper arm interacted with the cards, we eventually decided to attach the gripper to the same hand as the camera. This was primarily due to the inconsistency of each Baxter's hardware; some of their arms behaved strangely or had poor or non-functional cameras. In addition, the close distance needed for our card detection code to work would cause Baxter's two hands to interfere, requiring that the camera hand be moved out of the way between card plays. Placing both components on one hand would both simplify the path planning involved and decrease the overall amount of travel needed. The only trade off for this was a slight obstruction in the camera field by the body of the vacuum gripper.
Computer vision was a necessity to tackle card identification. The neural net we trained to identify cards did so quite reliably, even when the image frame was messy or inconsistent. As an added challenge, our team wanted Baxter to be able to locate cards dynamically, without relying on hard-coded coordinates or AR marker indicators.
Our solution to this task still involved the use of AR markers, but no predetermined reference values; instead, we combine the data returned by an AR tracking package with pixel coordinates returned by our computer vision node to dynamically localize the cards in 3D space. As a result, our AR markers can be placed anywhere in the camera frame without disturbing Baxter's ability to find the cards. We felt that this increased Baxter's ability to play and behave independently, adding robustness and flexibility to our design.
The one location hard-coded into Baxter is the location where he places his hand of cards--since this is location that should be predetermined anyways. No additional benefit comes from having Baxter randomly strew his cards across the table.
Our use of computer vision came with some idiosyncrasies, such as a slight latency in the camera feed that resulted in delayed images being processed by our card detection service. Other times, the service would return incorrect outputs if we accidentally obscured cards or the AR tag in the camera frame. To mitigate these issues, we added many confirmations into our code at the card detection step to give the operator an option to rerun the card detection pipeline if the image was stale or certain elements were not visible.