Design

Design Criteria

We had eight overall design criteria:

  • Create a structure made of blocks in Gazebo simulation, and take images from different angles of the structure

  • Correctly identify the bottom left coordinates of each of the blocks in the structure

  • Create a world with TIAGo and blocks placed around the world

  • Correctly build a map of the world using SLAM, and identify the positions of all of the blocks lying around in this world

  • Have TIAGo be able to take the position of a block and plan a path to that block, pick it up, and plan a path back to the structure it is building

  • Orient TIAGo’s arm correctly so it can place the block, and place block correctly on structure

  • End up with the same structure as was originally built in simulation

  • Perform this process in a reasonable amount of time

Our Design

  1. Image a structure

The first step to imaging a structure is to have a structure to image. We abstracted everything to do with the world into a single class WorldSimulation that we could run as a script.

Choosing building blocks

We decided to use only rectangular prisms to build our structure. Other shapes such as spheres or ellipsoids would have been too tough to place and balance; shapes such as triangles would have been too hard to grasp. Rectangular prisms offer a few clear advantages: they are relatively stable when placed, they have well defined qualities e.g. moment of inertia, and they are symmetrical.

Building a structure

We decided to add complexity to our structure by building multiple layers. These layers were made of squares of blocks (where we could specify how many blocks wide the square was, as well as the positioning). This basic layer structure fits well into several assumptions that we leverage later on to generate a schematic. An alternative was to add rectangles of blocks, but we decided to keep it as squares of blocks to keep it simpler -- our schematic generation doesn't rely on any square layer assumptions.

Coloring the structure

We picked out a few colors at random that Gazebo provides. When generating our structure, we cycle through this list of colors so that no block is adjacent to another block of the same color (of course, since we have a finite number of colors in this list, their could still be adjacent blocks with the same color, but this should be rare). This is done in order to facilitate the computer vision and differentiation of blocks. An alternative was to color each block randomly, but this would have resulted in colors that may not have been as communicatively efficient. For example, with the colors we chose, we could always specify "the blue block" or "the red block" without any thought of needing to clarify which color exactly. Compare that to if we generated random colors -- we would have had to first contemplate what the best description for a color was, then hope that the other person could interpret what we said into the correct block reference.

Camera angles

The camera angles were chosen based off of the location of the structure we wanted to image. We didn't set any rules or guidelines for this, but we found a few tricks heuristically. First, the cameras had to capture all of the blocks. This means that there needed to be cameras on all four sides of our structure (towards the idea of covering 90degrees each) and that there needed to be cameras at varying heights to capture both blocks on the bottom of the structure and the block on the top of the structure. An alternative would have been to place cameras in a grid all throughout the world and take snapshots of everything (with the intention of being the most accessible to users as this is the least work for them), but this would have been much too slow (consider on the order of 1000 cameras to on the order of 10 cameras) and this gives the user the ability to pick and choose which parts of their structure are the most representative (in fact, we have our 16 camera angles hard-coded in for the specific demonstration structure).

Taking the images

We originally had several cameras placed in the world at poses that the user specified, but at the suggestion of 106a course staff, we changed that to just 1 camera that would move throughout the world, stopping at each of the input poses. The original idea had several advantages: we wouldn't have to worry about the movement of cameras and it would have been faster for smaller numbers of cameras because the images could be taken in parallel. The new idea has several advantages: moving 1 camera around a structure is more akin to what capturing a structure in real life is like and if the number of cameras grew too large the computer wouldn't be overloaded since only 1 camera would be in the world at once.

Camera data transfer object (DTO)

Since we abstracted all of the simulation-related logic into one class, we needed a data transfer object between our components. We decided to create a class CameraDTO that contained key pieces of information: the camera intrinsic matrix, the camera pose, the camera image data, etc. There were a few alternatives to our abstraction choice, namely that everything would be run in a single service (hence removing the need for a DTO), but since the two portions world simulation and generate schematic did separate, mostly unrelated functions, we decided to keep the abstraction layers this way.

2. Reconstruct a schematic of that structure leveraging those images

Our schematic consists of the (minimum x, minimum y, minimum z) corner of each block that we want to place. This was an arbitrary choice -- any other corner chosen would also suffice.

Image processing

Leveraging the CameraDTO, we had access to the image, pose, and intrinsic matrix for a single camera. Using this information, we can recover the 3D positions of features in images taken by an adjacent pair of cameras. After finding features of interest in each image, we apply epipolar constraints in order to remove spurious matches. We use the poses of our cameras and rigid body transformations to convert all of our matched coordinates into the world frame.

Corner detection

Our original idea to find corners was to leverage OpenCV's feature detection because it would generalize well to other shapes. This worked reasonably well -- the features we found were mostly corners and yielded sufficient schematics. In an effort to improve the consistency of our schematic generation, we decided to leverage our rectangular prism assumption and focus specifically on detecting corners using Harris corner detection.

Finding visible block poses

We decided to calculate all of the poses of visible blocks in our images using a model-free approach. A block pose consists of its position and orientation. Calculating the orientation of the block is challenging -- to start, we would have to send at least two corners in order to uniquely determine the orientation of a block (think creating a vector). In addition, we would have to differentiate among blocks, something our model-free approach wouldn't be able to do. Thus, we assumed that all of the block orientations are axis-aligned, that all of the blocks have the same orientation, and that we know this orientation.

To calculate a block's position, we use the locations of the corners of the block. First, we assume that all blocks are the same size because finding the block corner locations would be extremely challenging otherwise. This is a valid and reasonable assumption to make because we typically know the dimensions of building blocks in real life. This is also scalable to various block sizes e.g. consider the case where we have 3 different standardized block sizes: we simply run our program 3 times with those as our block sizes (however, there could be added complexity in the ordering of blocks to place).

Since we determine the locations of blocks by their corners, we could never differentiate whether there was a gap in between two blocks or not. Thus, we assumed that blocks are always adjacent to other blocks (except for the case of a single block, where there are no other blocks to be adjacent to). We considered a few other approaches to determining the position of a block, namely using a model based approach such as a classifier, but that seemed to be out of scope for our project. Adding a classifier could also potentially enable our project to detect non-rectangular prism structures.

Rounding to nearest offset

In order to generate a clear and noiseless set of block coordinates, we rounded each detected corner location to the nearest corner location given an offset. This offset is found by finding how many corners are close to a particular offset, then taking the offset that has the most resulting corners (towards the idea of a Maximum Likelihood Estimate (MLE)). The precision of offset calculations are adjustable for applications that require more precise locations. Here we leverage two previous assumptions we made: that all blocks are adjacent to other blocks and all of the orientation assumptions.

After cleaning up the coordinates of each corner detected, we decided to apply an additional filter: we keep only corners that are part of some square for the particular layer they're in.

Generating layers

Each layer on top of another layer must always have a block beneath it to support it. Thus, we loop from the top layer downwards, adding coordinates from a layer above to a layer below. This results in a stable structure.

3. Plan a reasonable ordering to build the structure

We had several heuristics to ordering our structure building. First, any block placed must have either a block or the floor under it. This restricted our ordering to be either layer by layer or slice by slice. Building by layers results in a more stable foundation, especially for taller structures (consider building the Empire State Building). However, if we had to build a really wide and deep structure, we could run into problems such as roBob not being able to reach on the inside of the structure to place blocks on top. Thus, ultimately, we would have to have a mixture of both orderings in order to fully build all structures possible.

4. Identify where the blocks are in the world

After being placed in the world, the robot should first figure out where blocks and walls are. We used SLAM gmapping to map out the environment and determine the locations of objects. To be able to see enough of the environment to find a block, we start by spinning the robot 360 degrees, which builds out a starting map. Then, as the robot is performing the rest of its tasks and moving around the first blocks we see, it will eventually be able to see the entire area (including behind the original blocks).

Building the Environment

We assumed that the only objects present in the environment are blocks and the walls, and that all the blocks would be placed on the ground. This allowed us to identify blocks without having to distinguish them from other objects. We found where the walls were located, and then found smaller areas where there appeared to be an object, which must be the blocks. Adding different obstacles would likely have meant the robot moving to each of them and using its camera to determine whether it was a block or not, which may have taken a relatively long time.

Exploring the Environment

We decided that the robot, after spinning once, would immediately begin its building process and build the map as it did so. An alternative would have been to have the robot do an exploration process to fully or close to fully explore the map and find every block in the environment before beginning its building process. We decided on building without fully exploring because it would save time. However, in a more complex map, it may cause the robot trouble to move around to the structure's intended location without having fully explored the map.

5. Navigate the environment until within grasping distance of a block

The robot should navigate to blocks on its own, in no particular order, before moving to the structure to place that block. To do that, we used MoveBase and the world map from gmapping to handle autonomous navigation. We picked points a little bit off from the locations of the blocks found previously, and angled the robot towards the block.

For navigation, we assumed that blocks wouldn't be too close to each other so that it wouldn't be difficult to find a proper positioning for the robot to be able to pick up blocks.

6. Grasp the block (and move and place the block in the correct location)

To pick up a block, the robot first needs to detect the pose of the block, and then plan a path for the arm joints to move using MoveIt.

ArUco Blocks

We used markers on the top of the blocks for the robot to detect and then create block poses. This assumes that all blocks are the same size and shape. An alternative would have been to use TiaGo's camera to try to find the pose of blocks only by looking at the shape of the blocks themselves. However, because the blocks were placed on the ground, this turned out to be very difficult given the angle that the robot's camera was looking at the block, and the markers worked out much better.

Block Shape

We started out using the same shape blocks for the simulations as we did for imaging. However, the blocks kept slipping out of the grippers of the robot, even though we tried using effort controllers and repeatedly attempting to close the grippers. As a result, we changed the blocks to the dumbbell shape that can be seen in the picture. This way, when the block slips down, the top rests above the gripper, and the robot can carry it without dropping.

Placing the block is similar. First we navigate the robot to the location where the structure is to be built. Then, the robot detects markers placed on the ground where we want the blocks to be placed and plans a path to place the block before opening the grippers.

Structure Location

We decided to create two separate rooms in our simulation, one for picking up blocks and one for building the structure. This made it so that we didn't have to worry about misplaced blocks affecting the robot's navigation or block finding. This could have also worked with the structure being built in one room; there would just need to be a set location for where the structure was meant to be built.

ArUco Placeholders

Because the robot's navigation wasn't perfectly accurate, it struggled to place blocks in exactly the right positions. Even when the same coordinates were given for it to travel to, it doesn't go to the same location. Placing markers on the ground in the positions required allowed the robot to use those to determine exactly where to place blocks. A new marker was created and placed every time the robot picked up a new block so that it could find that new marker and place its block there. This means that, in real life, someone would have to add to place these markers down as it was in the process of building. However, we could also modify it to use different markers for each of the blocks, thereby only needing to place them at the beginning of the process.

Block Spacing

The robot also struggled to place blocks without knocking over previously placed blocks, so we ended up requiring the blocks to be spaced apart from each other. It was unable to correctly model the poses of previously placed blocks as well as the block being held to avoid hitting other blocks. Spacing blocks apart meant that it had enough space to move its arm in the placing motion. This limited the size and variation of structures that we could build, but helped the robot do it more successfully.