SeafloorAI:

A Large-scale Vision-Language Dataset for Seafloor Geological Survey

Kien X. Nguyen, Fengchun Qiao, Arthur Trembanis and Xi Peng

In Proceedings of the 38th Conference on Neural Information Processing Systems

Track on Datasets and Benchmarks

Abstract

A major obstacle to the advancements of machine learning models in marine science, particularly in sonar imagery analysis, is the scarcity of AI-ready datasets. While there have been efforts to make AI-ready sonar image dataset publicly available, they suffer from limitations in terms of environment setting and scale. To bridge this gap, we introduce SeafloorAI, the first extensive AI-ready datasets for seafloor mapping across 5 geological layers that is curated in collaboration with marine scientists. We further extend the dataset to SeafloorGenAI by incorporating the language component in order to facilitate the development of both vision- and language-capable machine learning models for sonar imagery. The dataset consists of 62 geo-distributed data surveys spanning 17,300 square kilometers, with 696K sonar images, 827K annotated segmentation masks, 696K detailed language descriptions and approximately 7M question-answer pairs. By making our data processing source code publicly available, we aim to engage the marine science community to enrich the data pool and inspire the machine learning community to develop more robust models. This collaborative approach will enhance the capabilities and applications of our datasets within both fields. Our code repository are available on GitHub under the CC-BY-4.0 license.

Tow ship and AUV recording the seafloor details. Photo courtesy from Nautilus Magazine.

What is Seafloor Mapping?

Seafloor mapping is the process of creating detailed maps of the ocean floor to understand its shape, depth, and the types of materials that cover it, like sand, rocks, or mud. This information helps scientists and industries make decisions about marine life conservation, offshore construction, and resource exploration.

The photo shows a tow ship and an autonomous underwater vehicle (AUV) capturing detailed images of the seafloor’s texture (backscatter) and depth (bathymetry) using multi-beam echosounders or side-scan sonar.

Why Seafloor Mapping?

Seafloor mapping stands at the forefront of marine science, utilizing cutting-edge technologies like multi-beam echo sounders and side-scan sonar to unveil the hidden complexities of the ocean floor. Beyond scientific research, seafloor mapping is instrumental in identifying potential resources, assessing environmental impacts, and supporting sustainable ocean management practices in the context of the blue economy. However, the current analysis techniques in seafloor mapping are predominantly labor-intensive and reliant on manual interpretation by marine scientists, necessitating hundreds of hours spent meticulously examining data surveys to analyze seabed imagery. This hands-on approach is not only time-consuming but also susceptible to user subjectivity and the limitations of individual expertise, thus introducing potential inconsistencies in analysis.

The integration of machine learning (ML) holds the promise of enhancing efficiency and reliability in seafloor mapping by automating the segmentation and classification tasks. To this end, we introduce SeafloorAI, the first extensive AI-ready sonar imagery dataset for seafloor mapping. We also incorporate language into our dataset, extending it to SeafloorGenAI.

Overview of our datasets, SeafloorAI and SeafloorGenAI. The table highlights key dataset statistics. We incorporate 62 public data surveys published by USGS and NOAA from 9 major regions to construct SeafloorAI and SeafloorGenAI datasets. Our dataset contains 9 geological layers, 4 of which are raw signals, i.e., Backscatter, Bathymetry, Slope and Rugosity, and 5 annotated by human experts, i.e. Sediment, Physiographic Zone, Habitat, Fault and Fold. SeafloorAI serves as a dataset for standard computer vision tasks, i.e. semantic segmentation, whereas SeafloorGenAI constitutes a dataset for generative vision-language tasks, i.e., general visual question answering and instruction-following mapping.

Public Data Coming Soon...

The SeafloorAI Dataset

The dataset contains 827K ground-truth segmentation masks for 696K sonar images across 5 geological layers. The spatial dimension is 224 x 224 for each sonar image. However, the users are free to re-patchify the raster map provided below and create their own versions of the dataset. With that said, we also provide data processing code on our GitHub. We include an sample collected from Region 5.

The SeafloorGenAI Dataset

SeafloorGenAI extends SeafloorAI with the incorporation of the language component. We aim to equip each sonar image with a detailed language description as well as several question-answer pairs. The ultimate goal is to develop a large vision-language model for marine science.

The Language Annotation Pipeline for SeafloorGenAI

We leverage in-context learning (ICL) in GPT-4 to automate the language annotation process, providing few-shot input-output pairs for the LLM. In this case, the input contains the key analytical indicators and the output is the description written by the marine scientists for the same image. To construct the ICL input, we, in collaboration with marine scientists, identify the essential information required for analysis. Subsequently, we use standard statistical and computer vision tools to extract three categories of information: (1) geophysical parameters, (2) spatial distribution and (3) geological composition.

The objective is to help the model ''see'' the sonar image through as much detailed language descriptions as possible. For the ICL output, we ask marine scientists to manually describe in domain language 50 randomly selected samples from the SeafloorAI dataset. ICL ensures GPT-4 can accurately mimic the domain-specific language, enhancing the quality and relevance of the generated answers.

Next, we design a prompt to GPT-4, comprised of the input-output pairs and the extracted analytical indicators, to generate general descriptions and question-answer pairs for the remaining images. Finally, the domain experts carefully evaluate the generated language annotations to ensure quality and consistency. The last two steps form a feedback loop, creating an iterative prompt refinement process.

Citation (BibTex)

@inproceedings{nguyen2024seafloorailargescalevisionlanguagedataset,

title={SeafloorAI: A Large-scale Vision-Language Dataset for Seafloor Geological Survey},

author={Kien X. Nguyen and Fengchun Qiao and Arthur Trembanis and Xi Peng},

booktitle={Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track},

year={2024}

}

Page updated

Report abuse