Alina Marcu, Dragos Costea
Prof. dr. Marius Leordeanu
Prof. dr. Emil Slusanschi
A multi-stage multi-task neural network that is able to handle segmentation and localization at the same time, in a single forward pass.
Stage 1 is designed for semantic segmentation
Stage 2 provides a precise location using two branches
LocDecoder-R-2 predicts location as two real valued numbers for longitude and latitude.
LocDecoder-S-128 predicts a localization map of size 128x128 on the whole area of possible locations. White pixels denote probable locations of the input image.
We collected 9531 512x512 pixel images randomly chosen within a 100x100 squared meters area around any intersection, covering in total an European urban area of around 70 squared kilometers.
Each grey disk in the figure depicts a region of 500 meters radius around the training (blue centers) and testing (red centers) data.
The localization network predicts a dot, we extract the roads from that location and match against the roads from OSM. The aligned roads generate an offset from the OSM roads and provide the final localtion, as shown below.
Unfortunately, the OSM roads never match perfectly to the real 'ground truth', affecting localization performance.
For the task of semantic segmentation, we report state-of-the-art results on the publicly available Inria dataset, using our MSMT-Stage-1 network.
Qualitative results shown below:
RGB input image MSMT-Stage-1 prediction Ground truth
Quantitative results shown below:
Detalied error comparison depicted below
Marcu, Alina, et al. "A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization." arXiv preprint arXiv:1804.01322 (2018).
@article{marcu2018multi,
title={A Multi-Stage Multi-Task Neural Network for Aerial Scene Interpretation and Geolocalization},
author={Marcu, Alina and Costea, Dragos and Slusanschi, Emil and Leordeanu, Marius},
journal={arXiv preprint arXiv:1804.01322},
year={2018}
}