Conference on Neural Information Processing (NeurIPS) 2023

M2SODAI: Multi-Modal Ship and Floating Matter Detection Image Dataset With RGB and Hyperspectral Image Sensors

Jonggyu Jang1, Sangwoo Oh2, Youjin Kim3, Dongmin Seo4, Youngchol Choi2, Hyun Jong Yang1*

1Pohang University of Science and Technology (POSTECH)

2Korea Research Institute of Ships & Ocean Engineering (KRISO)

3Samsung Electronics 4Semyung University

*Corresponding Author

Abstract


Recently, object detection in aerial RGB, infrared, or synthetic-aperture radar images has drawn increasing research attention. Our focus is maritime object detection mainly for detecting and localizing ships and floating matters, which is an imperative task for reliable surveillance/monitoring and active rescuing. Notwithstanding astonishing advances of computer vision technologies, the task becomes challenging as cameras’ field of view and/or object distance increases. What makes it worse is pervasive sea surface effects such as sunlight reflection, wind, and waves. Hyperspectral image (HSI) sensors, providing more than 100 channels in wavelengths of visible and nearinfrared, can extract intrinsic information of materials from a few pixels of HSIs. The advent of HSI sensors motivates us to leverage HSIs to circumvent false positives due to the sea surface effects. However, there are handful public HSI datasets because collecting HSIs is monetarily costly and laborintensive. Lack of public datasets is the major hindrance to object detection research based on HSIs. We have collected and annotated a publicly available dataset “Multi-Modal Ship and flOating matter Detection in Aerial Images (M2SODAI),” which comes with 1,257 synchronized image pairs of 1600×1600 pixel RGB data and 224×224 pixel HSI data. For diversity, we did 59 flight strips in 12 flight measurement campaigns (11 different spots). The M2SODAI dataset contains 5,764 instances per category, each of which is labeled by a bounding box. To the best of our knowledge, this dataset is the first bounding-box-annotated and synchronized aerial RGB/HSI dataset. For the detection algorithm, we propose a DoubleFPN architecture, a novel multi-modal extension of the feature pyramid network (FPN). Extensive experiments on our benchmark demonstrate that fusion of RGB and HSI data can enhance mAP, especially in the presence of the sea surface effects. The source code and dataset will be available soon.