A lot of data is needed to train ship segmentation algorithms. However, it is difficult to obtain actual ship data. Therefore, several methods are used to train using synthetic dataset and to reduce the domain gap with real data.
Train deep learning algorithm that distinguishes types and area of ships with graphics-based virtual image data.
Reduce the domain gap between synthetic data and real data to maintain deep learning algorithm performance from real data.
Evaluation by installing it on an unmanned ship
The basic structure of the segmentation algorithm consists of an encoder that extracts features and a decoder that outputs segmentation. The figure below is the segmentation algorithm used in this study.
Commonly used segmentation algorithms include UNet and DeepLabV3. In this study, an algorithm called EfficientDet was used for the encoder and DeepLavV3 was used for the decoder. EfficientDet is an algorithm that recognizes objects, finds bounding box, and extracts class. In this algorithm, the box detection block and the classifier block are removed. It takes a 256x512 RGB image as input and outputs a segmentation map for each class.
Randomization was performed using Albumentations.
Pixel level
Add noise, blur, RGB shift, etc.
image level
HorizontalFlip, ShiftScaleRotate, random crop, grid distortion, etc.
Domain Adaptation is a simple style transfer using Fourier transform. A new image is obtained by synthesizing the ship with the target image, and the amplitude of the image is replaced with the amplitude of the background image. After that, an inverse FFT is applied.
The batch norm is mainly used in deep learning, and the domain norm was used in this study. The batch norm normalizes all features across the batch and spatial locations. The domain norm normalizes all features across spatial locations, and then across channel.
Synthetic data obtained by rotating 5 deg every 100m for each ship and deploying one ship from 100m to 5000m is used as train dataset. Real data is used as validation dataset.
Focal loss is based on Cross Entropy (CE) loss. Two losses are shown below. Focal loss multiply CE loss by the output to focus more on the unpredictable parts.
Validation with Synthetic datasets. Batch Norm
Before labeling real data, validation was also performed with synthetic images, but at this time, we trained without using the method of reducing the domain gap introduced above. First, it was confirmed that the loss decreased well as training progressed.
Usually, it showed good results, but when the ship was very far away, there were cases where it was recognized as a ship, recognized water shadow as a ship, or recognized two ships as a boundary between the sea and the sky.
Prediction result with real data are as follows.
To solve the above problem, we trained using randomization and Domain Norm.
Validation with real datasets. Domain Norm
After labeling the real data, it was used as a validation dataset. We trained algorithm with domain norm. Except for the sea and sky in the class, only segmented ships.
In many cases, the sea was recognized as a ship or one ship was recognized as several ships.
Validation with real datasets. Domain Norm
Real data was used as validation dataset. We trained algorithm with domain norm, Albumentations, and Domain adaptation.
Like Method 2, background was often recognized as a ship or one ship was recognized as several ships.
The current algorithm has output segmentation maps for each class, and we plans to train by modifying it to output one segmentation map and class probability. If similar results come out in this method, we plan to add real data to the train dataset.
Tan, Mingxing & Pang, Ruoming & Le, Quoc. (2020). EfficientDet: Scalable and Efficient Object Detection. 10778-10787. 10.1109/CVPR42600.2020.01079.
Chen, Liang-Chieh & Zhu, Yukun & Papandreou, George & Schroff, Florian & Adam, Hartwig. (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.
Yang, Yanchao & Soatto, Stefano. (2020). FDA: Fourier Domain Adaptation for Semantic Segmentation. 4084-4094. 10.1109/CVPR42600.2020.00414.
Zhang, Feihu & Qi, Xiaojuan & Yang, Ruigang & Prisacariu, Victor & Wah, Benjamin & Torr, Philip. (2020). Domain-Invariant Stereo Matching Networks. 10.1007/978-3-030-58536-5_25.
Lin, Tsung-Yi & Goyal, Priyal & Girshick, Ross & He, Kaiming & Dollar, Piotr. (2018). Focal Loss for Dense Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence. PP. 1-1. 10.1109/TPAMI.2018.2858826.