Assignment 6 SLAM





SLAM stands for "simultaneous localization and mapping".
In this assignment we are going to use a SLAM algorithm to do indoor SLAM.

The algorithm we choose is RGB-D SLAM.

Here is a demo of how RGB-D SLAM works:


RGB-D 
SLAM is a visual SLAM algorithm utilizing a RGB-D 
sensor, such as Kinect, to obtain RGB images and 
depth images.
It can generate colored 3D 
models of objects and indoor scenes using the hand-held Kinect sensor.

Here is the GUI of RGB-D SLAM.
The top figure shows the generated color point cloud. The bottom left shows the original RGB image, bottom middle shows the depth image, and bottom right shows the detected features.


This figure shows the structure of RGB-D SLAM Algorithm.


RGB-D SLAM detects visual features
 to match pairs of acquired images using 
SIFT or SURF, and estimate transforms between them 
using RANSAC. 
So as to improve the initial estimation, 
it uses a variant of the ICP algorithm to re.fine poses, 
and HOGMAN to optimize pose graphs. 
For every 
processed frame, RGB-D SLAM also provides its corresponding 
pose of the Kinect sensor, and a trajectory linking 
each frame (see next figure).



In order to examine the RGB-D SLAM results, we design two experiments. 
During each experiment, one of our team members carry the Kinect sensor, and another team member collects data from the Kinect sensor, and RGB-D SLAM will process data in real-time. 
So after we travel around the ring corridor (see figure below), RGB-D SLAM will generate a trajectory and poses along the trajectory.

Here shows the floor plan of our experiments. Red cross indicates the starting position, and the green line indicates our ground truth path.
So our path is a closed loop, i.e. the perfect trajectories generated by RGB-D SLAM should also be closed.

Except for 
a trajectory and poses, RGB-D SLAM also outputs octomap 
and point cloud of perceived environment.


The traditional way to evaluate SLAM algorithm is to use motion capture system to determine ground truth.
However, since ground truth is not available to us, we will examine RGB-D SLAM in a qualitative way based on our own experiment, and in a quantitative way based on online dataset with ground truth.

The figures below shows the trajectories generated by RGB-D SLAM. The left shows the planar view, and the right shows the 3D view.
Apparently, the loop closure test fails.

Next we use online dataset to quantitatively examine RGB-D SLAM.
The figure above shows the comparison between two sources of error, absolute trajectory error (ATE) and relative pose error (RPE).
We use 
root-mean-square error (RMSE) to measure the error.
Based on the figure above, t
he root-mean-square error (RMSE) 
of the ATE is always less than the RMSE of RPE. 
The 
reason is that the RPE considers both translational and 
rotational errors, while the ATE only considers the translational 
errors. 
As a result, the RPE is always slightly 
larger than the ATE (or equal if there is no rotational 
error).


1. One thing we noticed in our experiments is that the feature tracking is constantly lost.
A possible explanation is that we move the Kinect sensor forward along a corridor instead of pointing towards a certain scene. 
This may cause feature tracking fail more frequently, which results in performance decrease.

2. RGB-D SLAM is an interesting approach to solve visual SLAM. 
However, it does not detect loop closure and optimize path. 
To do post SLAM optimization, GTSAM is a good choice.

3. RGB-D SLAM is greatly limited by the range of its depth camera.
According to Kinect MSDN website, the working range of Kinect sensor is from 800 mm to 4000 mm.
This setting may be adequate in indoor environments, yet it's far from su.cient if the robot is operating outdoors.

4. RGB-D SLAM has the ability to solve the basic perception 
problem. 
Beyond that, we need to understand the environment. 
Semantic labeling is a common practice to classify indoor environments.

The figure below shows some tastes of semantic labeling of our experiment environment.
The environment is divided to several area 
based on human understanding.