Model Description:
The model utilizes a U-Net architecture, with MobileNetV2 as an encoder. It also consists of Down-Sampling layers and skip-connections to the self-attention mechanism in the decoder section of the architecture with upsampling, max-pooling, and convolution layers after the attention mechanism.
The model is trained on the CityScapes Dataset.
A try with a simple U-Net model:
The image on the input section is of our campus, while, the output shows a segmented drivable area that can later be used in a motion planning module by an autonomous vehicle as described in the paper here.
The image below shows our model (top) compared to the original proposed method (bottom).
Evaluation:
Sample output using our network on the CityScapes Dataset