Even in modern days, the process of purchasing a beverage is inefficient and tedious. The customer first has to wait in line before giving their order to a cashier, then has to wait even longer for their drink to be made. This process wastes both time of the customers and time of the shop, who has to accept each order vocally before beginning to make it. However, if the ordering could be received and processed remotely, the inefficiency of queues would be mitigated. This project aims to streamline the ordering and serving process to increase the efficiency in time spent for the customer and increase the profitability for the shops by automating the making of drinks and moving order processing online. We call our solution: "Baxter the Bartender".
This project takes the job of a server who received orders, then makes and distributes refreshments, and proves that such a task can be automated. Baxter will have to process an online order, locate and identify the contents of certain cups, then pour their ingredients into an empty cup in the correct order. Because this approach gives flexibility to adapt to multiple ingredients in a working environment, this solution can be implemented across the beverage industry to work with alcoholic, coffee, or even boba drinks.
Our intention was for Baxter to be able to receive an order from online, process the ingredients required, then learn where those ingredients were placed in his environment, then pour them in the right order. Baxter will need some method to identify ingredient position and the ingredients themselves.
In order to accomplish our goals, we decided to use Baxter's head camera and a series of AR tags to simulate identifying ingredient position and type. AR tags provide fairly reliable platform for the head camera to detect and use for accurate position measurements. An HTTP server would be used to broadcast to a ROS service in order for Baxter to process the order. For position-reliant actions such as grabbing a bottle, Moveit would be used to calculate the inverse kinematics. For complicated actions that could be performed in one location (such as pouring), the joint file playback script could be used for consistency and high accuracy.
With our design choice, certain functionality was both gained and lost. While robustness was desired in ingredient position, a familiar method was also desired to get a working demonstration in the small time allotted. Because of this, AR tags were chosen in place of computer vision. AR tags are also used to identify ingredients because of ease of correlating an ingredient to a unique tag, but that means individual tags need to be assigned to individual ingredients. Another choice in which familiarity played a part was the decision to use Moveit, although it was fairly unreliable throughout the development of Baxter the Bartender because of its haphazard and random algorithm for finding moving plans. While the AR tags are useful for their simplicity, in a real world application they would need to be substituted for a more subtle and more practical way of finding position and contents.
There were three main services written to identify correct AR position: Fixed_visualization_marker, fixed_head_camera, and fixed_multi_service. At first, the head camera gave an offset of a few degrees along on if its axes, leading to incorrect position data from the AR tags. This was an issue due to the importance of correct tag position in order to grab and maneuver the cups around the work space via the grippers. To fix this problem, the fixed_visualization_marker service was written to take the incorrect data of visualization_marker and apply the correct quaternion rotation and position transformation such that the new reported position would be correct. Fixed_multi_service records the latest known locations of all tags so that Moveit can still move to their location if the head camera loses vision of the tags.
The file written to execute the general operations was the multicup.py file. This file is in charge of the following steps:
Process:
Functionality was successfully demonstrated from order processing to serving with two cups , and up to three cups demonstrated for selecting ingredients in the right order. Baxter is able to take orders online via the HTTP server, then wait before seeing a person marked with the correct AR tag before making their drink. Moveit occasionally gave unusable kinematic plans that would drive the gripper into the tabletop or flip it such that the contents of the bottle held would be spilled. Lighting occasionally made it difficult for the AR tags to be seen by the head camera, preventing operation.
The final iteration works but has issues with consistency due to Moveit randomness. Baxter successfully identifies both people and up to three drink ingredients based on AR tags. Functionality could be extended to more ingredients fairly easily. Baxter doesn't have any way to identify how full a bottle containing ingredients is and currently has only one pouring motion. Thus, his robustness in a real world environment may be lacking with the current setup.
The most difficult part of this project was identifying the issue with the head camera and fixing it because the design was so reliant on the head camera and AR tag system for locating ingredients. Other issues such as the left gripper failing to operate restricted ability to implement more functionality such as using both hands to pick up and choosing the optimal side to perform a function.
Future improvements involve moving away from AR tags to more reliable and less visually intrusive options. Moving to a more reliable inverse kinematics algorithm other than Moveit would greatly increase the consistency of the project. Adding ability to change pouring based on container content would be beneficial for improving robustness.