CSE481C

6/6/2022

Weekly update for 6/6/2022

Update:

Segmentation Update:

This week we finalized both of our two perception algorithms discussed in prior updates. The RANSAC based segmentation algorithm performed fairly well, and could easily distinguish between different types of objects (see video). It struggled against objects that looked the same or similar, for example the two pill bottles. The deep learning-based approach did even better in this regard, as it uses the RGB data which the RANSAC algorithm ignores. After introducing downsampling using fixed-size voxel grid clustering, the model became much better at predictions on denser pointclouds than it had been trained on, which allows us to input the entire scanned pointcloud into the model and have it retain spatial understanding. The deep learning segmenter outputs a predicted class for each point in the input pointcloud, so we can locate the object by filtering the model output by the selected class and performing outlier rejection.

Object Picker Update:

We also finished the object picker strategies discussed last week. The gripper will now rotate and adjust the gripper pads before gripping an object. We target the most accessible face (surface normal) of the smallest unaligned bounding box of the target object's filtered point cloud.

We also explored using the input pointcloud to create obstacles for our motion planner. However, it was not very accurate (or performant), so building a longer sequence of pre- and post-grip positions which would avoid collisions with the shelf worked better in practice.

Team Reflection:

Teamwork went well throughout the quarter. At the beginning of the quarter, we decided we would do weekly updates where teammates updated each other as well as regularly switching roles throughout. The first turned out to be too restrictive and slow, especially when the team ended up doing most of the work on specific days. We ended up following a more informal system where members would update each other each day whenever a task was done (such as a particular section of the lab) and members could ask each other online what the status of a particular task or the robot was. This ended up being a very effective strategy, especially since members were often in the lab asynchronously. The second strategy worked decently, in that different members got an opportunity to work on different parts of the lab and the project. However, there weren't a lot of opportunities to do certain things like fabrication, so in reality there were only a few "types" of tasks to work on at any given moment. This meant that role switching wasn't very well defined. All in all, a good experience.

Course Reflection:

For us, the most enjoyable part of the course was certainly the last couple of weeks. We enjoyed defining our own subset of the problem to work on and develop solutions for this problem. Throughout the quarter, labs touching upon topics central to robotics like point clouds and ROS architecture were most useful, while the web interface and visualization portions were interesting but not as useful overall. It would have been helpful to spend less time on the latter and more on exploring topics like building tools for perception, navigation, etc.





5/30/2022

Weekly update for 5/30/2022

Update:

This week we started implementing the two perception algorithms discussed in the last weekly update. Both algorithms are promising and we are making good progress. We have mostly completed the RANSAC based segmentation algorithm. A video of this segmenter is shown to the right. Given an object to look for, the algorithm compares an input point cloud to a list of pre-captured point clouds of the object that we are attempting to segment. The RANSAC result with the most inliers is considered to be the best fit object. The source point cloud inliers are then extracted--these points are considered to be the point clouds that define the object.

For the deep learning-based approach, we started by validating a model we created worked on a bunch of random point clouds. As seen on the right, one of the training sets we trained on was parts of a plane. Training on some random datasets, we achieved reasonable accuracy (individual point cloud estimations in the 90% range). Once we validated the model would learn objects in a point cloud, we collected a bunch of data on objects we want to pick up. We created an automated data collection script that captures individual object point clouds. This allowed us to collect a bunch of object point cloud data and allows us to avoid labeling since all training point clouds had a single item in them. Then, to train the model on point clouds with multiple items, we created a point cloud synthesizer that combines point cloud data to combine multiple images into a single scene. We also collected some garbage data of random objects that are not in our dataset as well as empty bins and the ground. We are currently training the model on the data we collected. A resulting labeled point cloud for the given input point cloud is shown to the right. The container of rubber bands is not a valid object for us, so it is all gray, and as you can see the paint and fish food are both individually labeled.

We are also working on a simple object pose estimator to allow us to pick up objects more effectively. Once we know which object points are part of the object we are picking up, we can compute the surface normals of the object, find the closest points to the center, and compute an averaged center object pose. Furthermore, we can compute the smallest unaligned bounding box of the object point cloud to inform our decision on the rotation of the gripper and the span of the gripper pads when grasping the item.

Script:

https://docs.google.com/document/d/1RFxpPzpMb8jinfJJJN8sP0Rs-rUmi-TMZLD5FuFOEXw/edit?usp=sharing

Ethics:

Robotic picking is an avenue of research that poses many significant ethical dilemmas. Potential widespread use of robotic pickers come with many advantages over traditional human labor, but also many disadvantages.

The positive impacts that robotic pickers have are clear. Menial picking is a low paying task that can, and regularly does, lead to significant physical trauma to human workers. So called "wear and tear" on robots is a comparatively minor issue. In the long term, robots can often be far more efficient than human workers, being able to work for effectively infinite time without a break, and often with higher accuracy and speed. Robots can potentially be much more cost effective than humans, which is a major advantage for companies. Consumers, too, could potentially see these economic efficiencies manifest themselves in lower costs. Significant automation in certain industries have massively grown the economy, and picking robots are a ubiquitous task that could very well have the same effect.

Robotic picking can also, however, negatively impact the world. Automation-driven job loss has been a major issue in several industries, especially the automotive one, and picking things provide millions of jobs to workers around the world. One of the positive impacts, being more cost effective and efficient, could lead companies to replace their human workers with picking robots, leading to large scale job loss and economic damage. Robotic picking can also be detrimental to the environment. Robots, especially large scale, can require huge amounts of material to produce and use, as well as producing significant amounts of pollution or greenhouse gases. Widespread usage of such robots can also pose a danger. Robots can be hacked into and used to cause damage to property or human beings around them, and having a robotic picker in, say, a supermarket gives ample opportunity for a malfunction or hacker to cause damage.



5/23/2022

Weekly update for 5/23/2022

Progress update

This week was dedicated to researching potential deep segmentation approaches and cleaning up some of the infrastructure necessary to do fully automated object grasping.

Grasping pipeline

Referring to the ROS diagram from last week, we completed the "Bin/Object Selection Requester" and "Bin Cropper" portions of the pipeline. We can now select a bin to crop relative to an AR tag located on the bin. Furthermore, we set up a simple front end tool that allows us to record which items are in each bin and then select a bin and an item to grasp for. This information is published upon user request. See the demo to the right for an example of the new bin cropper.

Deep segmentation

We discussed various approaches to segmentation and have now decided to explore two avenues of approach--a classical one and a deep learning-based approach. We will be creating a RANSAC-based segmentation algorithm that takes into account known features of the problem space (for example, we know exactly which items are in a particular bin). The deep learning approach is based on a paper called "PointCNN: Convolution On X -Transformed Points" (https://arxiv.org/pdf/1801.07791.pdf).







5/16/2022

Weekly update for 5/16/2022

Part 1: Project Proposal

Picking robots are interesting because the task presents an assortment of technical challenges that capture many fundamental aspects of robotics. In order for a picking robot to be successful, it must be able to perceive the world around it in some way in order to grasp particular items while avoiding other items. Once an item and obstacles have been identified, the must move its arm in some complex sequence in order to successfully grasp the item. A picking robotic pipeline must be quite complex and robust. Perception, navigation, and locomotion are key robotics tasks that apply to many robotics fields, including picking robots. In addition to its clear application in many fields, picking robots are a relatively risk free and simple task to build off of, which make it the perfect task and environment to experiment in. Picking is also a task that's directly relevant to a large amount of people. The task is, at it's core, about making menial labor easier for humans. Picking robots are simply the next step in a long series of innovations for such a fundamentally driving task.

In our project, we want to improve on the Pass failure metric. This is when our robot fails to properly detect an object and thus fails to pick it up. In particular, we often have "ghost detections" where we perceive an object when there's nothing. This appears to happen the most when there are objects in close proximity to each other and ghost detections most commonly perceive objects that have clear packaging. Such a detection can get in the way of actually picking up things, as they block the potential motion of the arm. To the side is a screenshot showing a large green ghost detection of pills. Furthermore, with our current detection pipeline, we often misdetect certain objects. We cannot, for example, tell the difference between a pill bottle and a cube container of the same size. Since the Amazon Picking Challenge requires a robot to identify a particular item in a bin, this poses a problem that must be solved.

To address this problem, we want to implement deep segmentation to augment our object detection. A small neural network can be trained and then used to detect items in the bin. This detector could simply detect if valid object(s) are seen, or in a slightly more complex case, how many objects are perceived and of which type, as well as their locations relative to each other. We can then use the detections and their confidence to modify our existing detections. For example, a detection of an object via the traditional detector can be cropped and ran through the deep learning based detector. A low confidence value in the latter would inform us that the former made a bad detection. We can then remove that detection, allowing us to grab other valid objects.

Part 2: System

ROS Diagram/FSM

To the left is a high level ROS diagram of the different nodes in the system. The green boxes are nodes that have been completed and the green are nodes that are in progress or incomplete. The general processing is done using the following pipeline:

  • The Bin/Object Selection Requester specifies a particular item in a particular bin that the robot will attempt to pick up.

  • This information is passed to the bin cropper, which takes this into account as it crops a point cloud received from the point cloud publisher. The AR tag tracker is used to estimate bin location. We currently have a naive bin cropper that crops a point cloud to include only the points between two AR tags; however, we must make this more extendible to allow us to crop a particular shelf without having to put AR tags around each bin.

  • The cropped point cloud is passed to the detector, whose goal is to locate and label items. The correct item that matches the requested item is passed to the gripper state solver.

  • The gripper state solver is a stretch goal and we would only complete if the detector is completed faster than expected. This node is responsible for finding the optimal gripper orientation required to grasp an item. In our MVP, we will assume a straight reach in for the item instead of solving for the optimal gripper orientation.

  • The desired gripper pose is passed on to the object picker node, which is responsible for actual requesting the arm to move. It requires communication with the IK solve library (MoveIt) in order to perform inverse kinematics of the arm.

  • The object picker node passes raw joint positions to the arm joint controller on the Fetch robot.

Above you can see a simple FSM of the system. Since we are focusing on perception and not HCI, we will not be attempting to exit the failed perception state via human interaction. Instead, the grasping pipeline will simply quit and go back to the standby state. We expect that failures that result in the robot not attempting to pick up an item will happen when perceiving the item and when computing the inverse kinematics, so we will ensure the system is resilient enough to catch these errors and reset properly when errors happen.

Milestones

In the upcoming few weeks, we want to demonstrate the feasibility of our project, create an end-to-end implementation and demonstrate it, and then evaluate it and improve upon it. In order and in more detail, these are:

  1. Demonstrate feasibility (Week 8): Build a rudimentary version of the segmenter and train it such that it can adequately detect a wide variety of objects in a wide variety of orientations, locations, etc. This model should output information that can be used to augment our existing detections.

  2. E2E version (Week 9): Expand on the week 8 implementation. This new implementation should be trained more, and hook up to the existing picking code. This will include improving the bin cropper algorithm and the front end object selection requester. Results from the segmenter should inform the existing segmentation code's decision, filtering out invalid detections and improving existing ones. The picker should now ignore ghost detections and be able to pick up objects that were previously infeasible.

  3. Improved version (Week 10): Perform a rigorous formal evaluation on the efficacy of the week 9 version, testing out different object positions, densities, etc. Determine if any configurations still result in ghost detections or other similar errors and improve the week 9 implementation to deal with those problems. This could potentially involve more training for certain situations or modifications to the filtering logic.

5/9/2022

Weekly update for 5/9/2022

This week we worked on perception-related labs. The goal of this week was to experiment with different segmentation schemes and design a basic framework for object detection.

Part 1: Segmentation Results

Our first task was to test out different segmentation algorithms. We tested euclidian segmentation, region growing segmentation, and color-based region growing segmentation. To pre-process the image, we used the smart cropper to look at a single bin. Without cropping to view only the internal contents of a bin, the segmenter segments the side and sides of the shelf. Below, you can see images of the segmentation results in RViz for the three different segmenters with different objects and packing densities. From a bit of testing, we concluded that color-based region growing segmentation worked the best out of the three segmenters, though neither of the three produced great results. On single items, all 3 segmenters had 100% accuracy. On scenes with two items, the euclidian segmenter often clustered two items as a single object (around 50% of the time). The region growing segmenter often would only detect a single item when there were two items in the scene. This happened pretty much 100% of the time. The color-based region growing segmenter performed the best compared to other segmenters when the shelf had two items. When a third item was added to the scene, the region growing segmenter performed the worst. It had an accuracy of 33% because it would only detect a single item in the shelf. The euclidian segmenter performed slightly better, and was sometimes able to distinguish between different items. The main issue came when the item normals were aligned, in which case the euclidian segmenter would combine items. The color-based region growing segmenter worked best by far with three items, as it was usually able to detect all three items independently, with the occasional misdetection because the items were combined. One thing of note is this segmenter performed reasonably well when items were directly next to each other, which was not the case for the other two segmenters. If we decide to use one of these three segmenters in the future, it will be the color-based region growing segmenter; however, we believe we can achieve better accuracy with another detection algorithm.

Euclidian Segmentation

Color-based region growing segmentation

Region growing segmentation

Part 2: Recognition Results

Next, we developed an object recognition pipeline on such that we could differentiate between different segmented objects using color-based region growing segmentation. We collected data on a few objects (pill bottle, tape, allergy medicine box, etc.), extracted features from these items, and used feature matching to. We then tested our object recognition algorithm on items in the shelf. Once the shelf was cropped, the algorithm worked reasonably well, though the allergy medicine (a box) was often misdetected as the pills.

Part 3: Grasping Objects

Once the object detection pipeline was complete, we developed a simple grasping tool that attempts to pick up detected objects. The segmenter publishes object information, which a moderator node subscribes to. The moderator allows the user to select an object by name to be picked up. Once the user selects a particular object, the moderator sends the pose of the object to a picker node. The picker node performs a sequence of pick actions to pick up the object. The sequence of gripper relative actions can be easily modified or different gripper action sequences added in the future depending on the object type. A video showing the robot picking up two objects in two different configurations is shown to the right.

Initial Project Interests and Approaches

Next week, we will be developing a complete project proposal, but this week, we did some brainstorming to determine what our final project will look like in more detail. At the moment, we believe that the most fundamental part of the Amazon Picking Challenge is detecting individual objects and extracting meaningful features that can aid in picking up the object. Once the location and shape of an object are known, there are more interesting things that can be studied such as choosing an optimal gripping pose. As such, we would like to focus on ways to improve object and feature detection. If object detection ends up being a less-than-challenging problem, we would also like to experiment with potentially combining ML-based feature extraction with gripper pose estimation. Another thing that makes detecting items in a shelf a hard problem is that the shelf occludes objects that are close to the sides of the bin. We initially cropped away the sides of the shelf that we were detecting, but if there are items close to the sides of a bin this strategy would not work well. It might be interesting to look into more precise ways of eliminating the shelf from the point cloud or image being used in object detection.

5/2/2022

Weekly update for 5/2/2022

This week we build C++ packages to use point clouds generated by Fetch's camera to perceive the physical world around us in RViz. We then used AR markers to detect specific areas around us, move there, and manipulate the gripper relative to that marker.

Smart Point Cloud Cropping

Here, we use placed AR markers to isolate and cut off certain parts of the point cloud. We detect, visualize, and record various markers placed on the shelf and their positions, and then built a program that makes the robot reach for a select marker with the arm. We then used a point cloud cropper we build previously to isolate a bin based on detected markers. Since the part we isolate is fixed relative to the markers on the shelf, marker data can tell us the pose of the isolated position as well.

Demonstrated Actions

Next, we explored how to create a full "action" for the robot to follow. This process starts at the initialization of an "action" and then records actions as the user physically moves the robot arm itself or opens/closes the gripper. Along the way, the user can define poses to be relative to something, whether that be the base frame or a tag. Afterwards, the sequence of actions can be recorded and rerun by the robot as necessary. A video of the creation and subsequence execution of such an action is to the right.






4/25/2022

Weekly update for 4/25/2022

This week we used RViz InteractiveM.arkers to control the arm. We designed a system that allows us to move the end effector to some position in the world and then another one that performs a grasping sequence.

InteractiveMarker-based control of the end effector

In this portion of the labs, we implemented a way to control the robot arm using RViz. In RViz, we created an interactive marker that has the size and shape of the end effector. This can be moved around with six degrees of freedom. The marker turns green when the end effector state can be attained and red when the marker's state is invalid. Inverse Kinematics and planning algorithms are used to move the arm. In RViz, you can right click on the interactive marker to open a context menu that allows you to move the arm to the interactive marker, open the gripper, and close the gripper. A demo video of this functionality is shown to the right.

Performing a grasping sequence using InteractiveMarkers

Next, we added additional functionality to specify pre-grasp, grasp, and post-grasp positions. From here, we can tell the robot to move its arm to the three grasp positions. This allows us to ensure that the end effector can be in a valid pose before it actually grasps an item. A demo video of this functionality is shown to the right.

Testing grasping on the real robot

We next tested our software on the Fetch robot. We grasped a bottle from a shelf. This test was successful. A video of this test is shown to the right.









4/18/2022

Weekly update for 4/18/2022

This week we used robot localization and planning to control the Fetch robot.

RViz interactive visualization

The first part of this week's lab was to develop an interactive visualization tool that can be used for robot navigation of the Fetch robot. In RViz, the Fetch robot is visible with InteractiveMarkers that the user may select to move the robot backwards, forwards, left, or right. Furthermore, in RViz we display markers on the ground to show the path the robot has taken. A demo video of this functionality may be seen to the right.

Robot navigation

Next, we used various tools in the ROS navigation stack to perform robot localization. We created a tool that allows the user to drive the robot around the map and annotate various regions of the map. The robot can than navigate to the various annotated regions. A demo video of this functionality may be seen to the right.

RViz tool for map annotation

Finally, we designed a tool that allows the user to specify locations on the map through RViz. A demo video of this functionality may be seen to the right. Through the UI, you may add InteractiveMarkers and tell the robot to drive to various locations. You may also delete the markers. Once a marker is created, you may drag the marker around and change its orientation.



4/12/2022

Weekly update for 4/12/2022

Fetch robot control

We gave an initial name to our team and robot this week. We are now team "Escalating to SIGTERM" and our robot is "SIGINT."

This week we followed some initial tutorials, in which we implemented some of the fundamental software needed to pieces needed to control the Fetch robot. In doing so, we developed an extremely simple tele-operation tool that we can use to control the robot.

This was tested in simulation. A video of the tele-operation interface is shown. In this video, we drive the robot over to a table, raise the torso and lower the head, and then control individual arm joints to position the arm to grasp the block on the table. We then grasp and raise the arm, now gripping the block. Some sensor information is included in the interface to make it easier to understand what is going on.

Roles and Responsibilities

In order for our group to operate cohesively, we will be assigning people different roles and responsibilities.

  • ROS guru: Shikuang

    • Gain familiarity with both using ROS and its documentation throughout the quarter,

  • Perception guru: Kaelin

    • Develops and updates tools to monitor the robot's sensor data throughout the quarter.

  • Hardware guru: Tudor

    • Is the team's point person on interacting with the physical robot and with any manufacturing tools, leads any discussion on what to fabricate throughout the quarter.

  • User interface guru: Matthew

    • Builds tools to use the robot as the team grows more familiar with the robot, whether it be through the command line or the web interface.

  • Manager: Matthew

    • Keeps the team on track by organizing regular meetings and leading decision during those meetings.

  • Documentation and communications: Shikuang

    • Keeps the team up to date on deliverables, whether they be blog posts or code.

In order to make sure everyone is actively learning, we intend on doing weekly updates where each member informs the others about the tasks they've done that week. Team members will also switch between different roles each week, each taking a chance to work on different aspects of the codebase. During meetings and in the lab, team members will be actively coding together, with the person in the "driver's seat" changing regularly. This way, each member will get a chance to work with ROS, the robot, or any other aspect of the codebase.



3/31/2022

Weekly update for 3/31/2022.

In order to perform well at the Amazon Picking Challenge (APC), we need a series of metrics to judge our work by. Standardizing our performance will help us better make design decisions as the project progresses. Listing down some potential errors will also help us identify weaknesses.

Performance Metrics:

  • Success percentage - The fraction of pick attempts that result in the requested item to be picked and placed into a tote without any error.

  • Incorrect item percentage -- Number of times that an item is placed in the bin which is not the item requested.

  • Number of items on the floor after the experiment is over

  • Pickup/stow speed - The amount of time needed for a requested item to be picked up and placed into a tote/bin without any error

    • Mean and maximum

  • Success percentage per item type - Same as overall success percentage, but selective for each SKU.

  • Power consumption - Average power needed for a requested item to be picked up and placed into a tote/bin without any error

  • Space consumption - Average space taken up by the robot for a requested item to be picked up and placed into a tote/bin without any error in cubic meters

  • Placing compactness - How tightly requested items and placed together in their totes/bins in terms of space between different items

Potential Errors:

  • If the wrong item is picked up and not replaced

  • If a requested item is picked up and ends up in the wrong position at the end

  • If an item is dropped from a height greater than 0.3 meters

  • If an item is damaged in some way (crushed or pierced)

  • If an item is left protruding from a bin by more then 0.5 centimeters

  • If the robot software stops executing and does not recover

  • If the robot moves in a manner that could potentially be a danger to nearby objects or itself

  • If the robot leaves the designated area at any time

  • If the robot cannot successfully identify and formulate a plan for picking up an item

An example of two performance metrics on the video on the right, specifically 0:14 - 0:53 (5 items)

  • Success percentage: 100%

  • Pickup/stow speed: 156 seconds / 5 = 31.2 seconds

Item Attributes

It's also helpful to qualify the different items the robot will be expected to manipulate. Here are some potential attributes:

  • Rigid/non-rigid - whether the object maintains its shape while being manipulated. Example: soap bar is rigid, dog toy is non-rigid

  • Box/non-box - whether the object is mostly a rectangular prism. Example: the Cheezits is a box, the glue is a non-box

  • Fragile/strong - whether the object can be easily crushed or pierced. Example: the Oreos are fragile, the box of screwdrivers is strong

  • Flat-edged/non-flat-edged - whether the object has flat sides, usually plastic. Example: the bottle brush is flat-edged, the crayons are non-flat-edged

  • Slippery/rough - how much friction the surface of an object has. Example: the index cards are slippery, the mesh pencil cup is rough

  • Graspable in any orientation/in a limited number of orientations: whether or not the item can be picked up independent of orientation due to the constraints of the grasper. Example: the dog toys may be grasped by the Fetch in any direction but the Cheez-It box cannot (due to its large size)

  • Easily visible: Whether there are clear distinguishing visual marks relative to an arbitrary single-colored background. Example: the Oreos are easily visible due to the bright clear markings, the safety glasses are not easily visible because they are a complex shape and mostly transparent