Goal
The goal of our project is to develop a robotic system that can perform tasks related to assembling cheeseburgers. This involve using a robot arm, such as a Sawyer robot, to pick up various ingredients (e.g. buns, meat, cheese, lettuce, etc.) from an ingredients region and transport them to a preparation region. Once the ingredients are in the ingredients region, the robot would need to be able to assemble the cheeseburger at the preparation region by placing the ingredients in the correct order according to our recipe.
Achieving this goal would require a combination of hardware and software components. On the hardware side, we would need to have a robot arm and any necessary supporting equipment, such as grippers or tooling to manipulate the ingredients. We would also need to have a way to detect and locate the ingredients in the ingredients region and a means of moving the robot arm and its gripper to the desired locations.
On the software side, we would need to develop algorithms and programs to control the robot arm and its gripper, as well as to process sensor data and make decisions about which actions to take next. This might involve using computer vision to recognize and classify the ingredients while motion planning and controller to solve the task of assembling the cheeseburger.
Interest
There are several reasons why a project like this could be interesting. One reason is that it involves combining a variety of different skills and technologies, including robotics, computer vision, and motion planning. Developing a robotic system that can pick up and manipulate objects, recognize and classify different types of ingredients, and make decisions about how to assemble a cheeseburger would require a broad range of knowledge and expertise.
Another reason this project could be interesting is that it involves solving a number of challenging problems. For example, you would need to figure out how to design and build a robot arm that is capable of performing the tasks required, such as manipulating small and fragile objects. You would also need to develop algorithms and programs to control the robot arm and its gripper, and to process sensor data in real-time to make decisions about which actions to take next. Additionally, you would need to consider issues such as safety, reliability, and efficiency, ensuring that the robot arm can operate without causing harm to humans or damaging the ingredients.
Overall, a project like this could be a rewarding and challenging endeavor that would require a diverse set of skills and the ability to solve complex problems.
Applications
There are a number of real-world robotics applications in which the work from a project like this could be useful. One potential application is in the food industry, where robots are increasingly being used to perform tasks such as picking and packing produce, preparing and packaging meals, and even cooking and serving food. A robot that is capable of assembling cheeseburgers or other types of sandwiches could be used in a variety of settings, including fast food restaurants, cafes, and catering companies.
Another potential application for this type of robot could be in educational or research settings, where it could be used as a platform for demonstrating and testing new algorithms and technologies related to robotics, machine learning, and computer vision.
In general, a robot that can pick up and manipulate objects, recognize and classify different types of ingredients, and make decisions about how to assemble a cheeseburger could be a useful tool for automating tasks in a variety of settings, potentially improving efficiency, accuracy, and safety.
Our system is expected to detect ar tags on the table, find coordinate transform matrix between camera and sawyer, have correct color segmentation to get the coordinates of food ingredients, able to pick up each food ingredients and place them in food preperation region without dropping them. Some criterias need to be satisfied for it to work:
The output of color segmentation need to be consistent under various enviromental factors such as lighting
The errors between the actual position of the food and the actual position of the gripper when performing picking up need to be smaller than 1.5cm (the food ingredient has diameter of 10cm, and maximum length of gripper's openning is 13cm). Such error originate from both the calculation errors during frame transform and the trajectory error of sawyer arm movement
The orientation of the gripper need to be facing downwards all the time, and motion paths taken should be inside desired workspace and not hitting scene objects such as cups used to stage our food ingredients. Such requirement ensures that food ingredient stays in the gripper.
The design of the whole system can be broken down into four parts. The perception module takes in sensor data including ar tags and foods from webcam, and outputs location data of each food in the camera frame. The coordinate transform module then convert those into location data in the sawyer frame. After getting food locations, the motion planning module gives the optimal path to pick up each food ingredient and assemble a burger in food preperation region. While generating those paths, the controller module takes part in it by sending joint velocities to the sawyer, which generate actions in the end.
We originally plan to use depth camera to avoid using ar tags, but all realsense cameras in the lab has been loaned out, and packages of kinect camera is not supported on ros noetic. Instead we used our own webcam and used camera_calibration package to find the intrinsic camera matrix and other coefficients. However the accuracy of the camera calibration is not high enough and affected our results badly in the end. In real engineering application, having a good sensor is crucial in improving efficiency of the system.
When trying to find transformation matrix between sawyer and camera, we need to use one ar tag as a "mid point" where either the transformation between that ar tag and camera, or the transformation between that ar tag and sawyer base are known to us. We Found three choices through trials. Initially, we put ar tag on the table and put sawyer gripper directly above it, but the errors can be large because we moved the gripper by our hand. Secondly, we put ar tag at the base of the sawyer. Such design has least transformation matrix applied, which gives the most accurate transform, but the camera detects unstable frame on the ar tag once it's too far away. Lastly, we put ar tag on right hand of sawyer and it give us rather stable output while still ensuring the accuracy of the coordinate transform.
We also deigned a custom gripper that has a "C" shape on each end. However the curved surface that made contact with the food ingredients was not exerting enough force to grasp the food, so we switched back the default gripper and screwed them wider to able to pick the food. We found that the default gripper is actually performing well by holding the food steadly in each try.
The cooking Sawyer is able to identify food ingredients at random locations on the table using computer vision (colour thresholding) and follow a desired recipe to assemble a burger. The team is able to use coordinate transformation (forward kinematics) to convert points from the camera frame to the Sawyer frame and conduct motioning planning with a customized controller that minimizes the distance of the path taken.
How well did your finished solution meet your design criteria?
We did produce a program that can accurately assemble a burger using computer vision. We also think the grippers we designed would be sufficient in picking up different forms of items, and implementing some of the extension features (like cutting food) would be straightforward to implement given the work we have already done.
As a result, we believe our finished solution is passable in relation to our design criteria; in the end, the only thing that we could not implement is using the spatula for reasons extensively discussed below.
Did you encounter any particular difficulties?
Yes, we had quite a few difficulties.
First, in the sensing, the biggest issue was the camera we used. For this project, accurate depth data (of the AR tags) is necessary to provide the correct food locations. Accurate and consistent depth data is dependent on a sufficient camera. Unfortunately, the webcam had an extreme fisheye effect and an unknown camera calibration matrix. Any attempt to calibrate the camera provided camera matrices that were quite inconsistent with each other. More importantly, each matrix produced wildly different results in terms of depth data.
I think it is important to note the cascade of effects that this caused; not only did we have to come up with workarounds to do basic tasks, but this severely limited us in terms of the extension and more interesting applications. For example, to use a spatula, base-frame Z-coordinate precision is very important; the spatula has to be exactly against the table, with error tolerance probably not much more than a millimeter. Without this level of precision, integrating a spatula into this project was impossible. Another negative effect was the fact that using our custom grippers was harder. The custom grippers were made to pick up all types of food objects, even those that were limp/non-squeezable. We tested these grippers and found that similar to the spatula mechanism, this is significantly harder with Z-direction imprecision.
The second largest difficulty was the motion planning. To ensure scalability and applications to other motions, we decided to use the MoveIt! package. The MoveIt package is pretty inconsistent, and does not really support orientation constraints, which is important for accurately picking up objects in tight spaces and ensuring the usability of something like a spatula.
The third difficulty we had was the default controller did not consistently work, so we had to use the student-made controller from lab 7. This controller definitely was unoptimized and produced errors of a centimeter or more.
Does your solution have any flaws or hacks? What improvements would you make if you had additional time?
For the camera issue, the hacky thing we did that deviated from the original design doc was the use of AR tags placed on the individual cups. The original intention was to use the AR tags just to get the region of cups and preparation area. However, because of the camera problems, we found using individual AR tags gave better results.
The most "hacky" things we did had to do with motion planning. The first, somewhat less hacky solution was to create a threshold for the ratio of Euclidean distance between the two points and the amount of points in the path. We created this threshold for medium-length distances (around half a meter) by doing motion planning until there was a relatively straight-line path, found the amount of points in that path. From there, we added a couple of points for leeway, and then took the ratio as the guiding threshold. I think with more reference data, this approach can produce consistently good results with pretty low computation time.
The second "hacky" think we did with motion planning was taking the minimum path length over 20 iterations of motion planning. This has constant (although possibly high) computation time, but it still can produce inconsistent results. For our uses, this was sufficient.
In response to the above flaws, given additional time, the first and most important thing we would do is get a different and better camera. As a group, we think this would dramatically increase the consistency and scalability of our solution. The second thing we would do is to find or create a better motion planning package. A similar fix could be done for the controller.