Terrain Segmentation

OVERVIEW

This section of the project aims to perform pixel segmentation into predefined terrain classifications. The goal of this subtask aligns with the project's overarching objective to assist in urban design: terrain segmentation can allow for informed decisions when planning city projects, residential construction, manufacturing plant feasibility, etc.

Additionally, terrain segmentation spotlights the power of computer vision techniques and current vision computing infrastructures. The meat of the code relies on functions and algorithms defined in Open3D - a 3D Data Processing Library that seamlessly integrates with Python and allows for visualizations, renderings, and convenient data structures.

Data

In alignment with the object tracking and crowd counting implementations, the terrain classifier too uses the VisDrone dataset. We believe this choice is the connecting factor between all 3 subprojects, and is depictive of some use cases and our vision for this project.

We chose the Visdrone Dataset because aerial drone footage fits the goal of our project perfectly: to design a practical computer vision solution for urban design. Applying terrain classification, in particular, to drone footage really highlights the capabilities and possibilities of current CV computing techniques. If our terrain mapping solution is upscaled and able to perform real-time processed output, the usability of that information could be huge.

Additionally, we used another dataset for terrain segmentation specifically that had pre-trained weights and pre-defined class labels. In this manner, we could implement the same class specifications onto our Visdrone data.

State of the art

After some research, we decided that the current state of the art for terrain segmentation is a Psp-MobileNet implementation. PspNet stands for Pyramid scene parsing network and generates various feature maps. The MobileNet model is a neural network proposed by Google in 2017 that applied depthwise convolution - which means each convolutional kernel is only responsible for 1 input channel.

Additionally, weights for the model can be initialized through pretrained weights of MobileNet and the remaining weights through the PSP Module from PSPNet. Below is an example of achievable output through the Psp-MobileNet implementation along with a research paper discussing the model in-depth.

architecture

For our project, we chose to implement a PointNet2 model. Similar to convolutional neural networks, this models learns hierarchical features with increasing context. It also deals with non-uniform densities through special layers that aggregate information from different scales.

It was created at Stanford University under the official name - PointNet++ to indicate that it is an extension of their previous PointNet model. This model is designed to process points sampled in a metric space or in other words, 3D points. It accomplished this by determining overlapping regions with a distance metric. These features are extracted and grouped into larger units. This processing continues to produce high-level features for the whole point set.

The link to the official PointNet++ research is below along with the Stanford webpage.

Coding implementation

The project follows the following high-level implementation strategy - preprocessing data; downsampling data; training; prediction; interpolation

For the preprocessing section, we first converted our original point cloud data points (input is .txt files that contains X,Y,Z,I,R,G,B information for each pixel in the original image; I is intensity of pixel) to pcd files. The process was to convert the .txt files to .pts where I had to handle the non-integer intensity values. Update the file to match the formatting requirements of pcd, then use Open3D IO functions to read and write these values from .pts to .pcd. This was the first and most important aspect of our usage of the Open3D library.

Open3D is a 3D data processing library. Highly effective for dealing with 3D data, it is well optimized and effortlessly integrated into our python environment. Moreover, there were other functions of this library that were required for our implementation. For instance, our voxel downsampling processing.

For the downsampling, we used the voxel_down_sample_and_trace function of the geometry module of Open3D. In order to do that however, I needed to update input my PointCloud object (the pcd file generated in pre-processing) but with additional color and points attributes. These were defined with the Vector3dVector function of Open3D's utility module that converts numpy array into Open3D format.

The training code is built on tensorflow that applies our model with custom ops defined in code. It additionally manually computes the accuracy of this model (all outputed with scalar function of summary module of tensorflow) but also outputs the mean iou from the predefined function from the metrics module. The interpolation too used predefined interpolate_label_with_color function.

Challenges

The challenge in the coding definitely came up most prominently in the preprocessing and downsampling stages. With little exposure to numpy and no exposure to Open3D, it was a challenge aquainting myself with these libraries and the module functionality.

Another challenge we faced was with environment setup - we first started on macOS but came to learn the CUDA-toolkit was no longer supported on mac. Thus, switched to a Windows machine where again, we ran into a problem with cmake where it could not identify the gcc compiler. Reinstallation of Visual Studio and explicitly pointing to its directory led to no avail. Finally we landed on stable Ubuntu 20.04 through Oracle's VMbox that provided the luxury of a Unix based OS while also supporting all modules (including the original CUDA-toolkit).

OUTPUT

As we can see, each pixel is color classified into one of the 8 different terrain class labels. This kind of information could be invaluable to urban planners as they consider things like manufacturing plant feasability, city design, construction scheme, etc.

https://github.com/roshanverma2001/CS639-Final-Project/tree/terrainMapping