InstanceMotSeg

Monocular Instance Motion Segmentation for Autonomous Driving:KITTI InstanceMotSeg Dataset and Multi-task Baseline

Eslam Mohamed∗ , Mahmoud Ewaisha∗ , Mennatullah Siam , Hazem Rashed , Senthil Yogamani , Waleed Hamdy , Muhammad Helmi and Ahmad El-Sallab

* Equal contribution

Moving object segmentation is a crucial task for autonomous vehicles as it can be used to segment objects in a class agnostic manner based on their motion cues. It enables the detection of unseen objects during training (e.g., moose or a construction truck) based on their motion and independent of their appearance. Although pixel-wise motion segmentation has been studied in autonomous driving literature, it has been rarely addressed at the instance level, which would help separate connected segments of moving objects leading to better trajectory planning. As the main issue is the lack of large public datasets, we create a new InstanceMotSeg dataset comprising of 12.9K samples improving upon our KITTIMoSeg dataset. In addition to providing instance level annotations, we have added 4 additional classes which is crucial for studying class agnostic motion segmentation. We adapt YOLACT and implement a motion-based class agnostic instance segmentation model which would act as a baseline for the dataset. We also extend it to an efficient multi-task model which additionally provides semantic instance segmentation sharing the encoder. The model then learns separate prototype coefficients within the class agnostic and semantic heads providing two independent paths of object detection for redundant safety. To obtain real-time performance, we study different efficient encoders and obtain 39 fps on a Titan Xp GPU using MobileNetV2 with an improvement of 10% mAP relative to the baseline. Our model improves the previous state of the art motion segmentation method by 3.3%. We summarize our work in a short video with qualitative results at this video.

We propose a computationally efficient solution to perform semantic and motion instance segmentation jointly. The contributions of this work include:

  • Release of a new InstanceMotSeg dataset with improved annotations over KittiMoSeg , in addition to instance labels and additional classes.


  • Demonstration of a good prototype of instance motion segmentation trained using the proposed dataset.


  • A real-time multi-task learning model for semantic and class agnostic instance segmentation using motion. Our method relies on learning different prototype coeffi-cients per task based on YOLACT


  • Ablation study of different backbones to find the opti- mal accuracy vs speed trade-off and different types of architectures for encoding motion.

Dataset improvements

We provide an improved version of KittiMoSeg extenstion dataset. KittiMoSeg extenstion provides 12919images from different KITTI scenes. We improve the dataset by the following updates:

  • We extend this dataset with semantic instance segmentation masks for 5 classes including car, pedestrian, bicycle, truck, bus instead of the original annotations for car class only.

  • We improve the motion class labels by estimating motion in 3D world coordinate system instead of the Velodyne coordinate system which significantly improved annotations accuracy over the method in FuseMODNet. As it had erroneous annotations due to motion of the Velodyne sensor itself.

  • We add instance information to the moving objects where each object has a unique ID across the whole sequence.

The annotations will be publicly available for download. The corresponding RGB frames are found in KITTI website.

Download

The InstanceMotSeg dataset can be downloaded here.

Architecture

We build our model on top of YOLACT architecture, where we add motion cues to the model to predict class-agnostic moving instances. This is done in parallel to instance segmentation head which predicts static and dynamic classes that has been trained for.

Results

Citation

If you use this dataset in your research, please cite this publication :

@article{mohamed2020instancemotseg, title={InstanceMotSeg: Real-time Instance Motion Segmentation for Autonomous Driving}, author={Mohamed, Eslam and Ewaisha, Mahmoud and Siam, Mennatullah and Rashed, Hazem and Yogamani, Senthil and El-Sallab, Ahmad}, journal={arXiv preprint arXiv:2008.07008}, year={2020}}