TL;DR: In this paper, we aim to unlock the full potential of unmanned deployment for USVs. We propose a Learning Positional Visual Docking for Unmanned Surface Vehicles Using an Auto-labeling pipeline to address the "last-mile" problem for USVs autonomous docking beyond convential visual-seroving methods. It is an auto-labelling systems that reduced human interventions. We tested and validated in real-world water environments.
Unmanned Surface Vehicles (USVs) are increasingly applied to water operations such as environmental monitoring and river-map modeling. However, precise autonomous docking at ports or stations remains a significant challenge, often relying on manual control or external positioning systems, which severely limits fully autonomous deployments. In this paper, we propose a novel supervised learning framework featuring an auto-labeling pipeline to enable USVs autonomous visual docking. The primary innovation lies in our automated data collection pipeline, which directly provides paired relative pose data and corresponding images, eliminating the conventional need for manual labeling, such as tagging bounding boxes. We introduce the Neural Dock Pose Estimator (NDPE), capable of accurately predicting the relative dock pose without relying on traditional methods such as handcrafted feature extraction, camera calibration, or peripheral markers. Unlike common bounding-box-based detection algorithms (e.g., Yolo-like methods), our NDPE explicitly predicts the relative pose transformation between the camera frame and USV body frame, significantly simplifying the data annotation and training process. Additionally, the generality of our data collection pipeline allows integration with various neural network architectures, ensuring broad applicability beyond the specific architecture demonstrated here. Experimental validation in real-world water environments demonstrates that NDPE robustly handles variations in docking distances and USV velocities, ensuring accurate and stable autonomous docking performance. The effectiveness and practicality of our approach are clearly verified through extensive experiments.
Our framework for the autonomous visual docking task based on a sequence of fisheye images with human-in-the-loop control. As summarized in above figure, the pipeline involves five phases: (a.i) data collection, (a.ii) dataset augmentation, (a.iii) model training, (a.iv) neural dock pose estimation, and (b) motion controller. For more details, please refer to our paper.
@article{CHU2025121609,
title = {Supervised visual docking network for unmanned surface vehicles using auto-labeling in real-world water environments},
journal = {Ocean Engineering},
volume = {335},
pages = {121609},
year = {2025},
issn = {0029-8018},
doi = {https://doi.org/10.1016/j.oceaneng.2025.121609},
url = {https://www.sciencedirect.com/science/article/pii/S0029801825013150},
author = {Yijie Chu and Ziniu Wu and Yong Yue and Eng Gee Lim and Paolo Paoletti and Xiaohui Zhu},
keywords = {Autonomous docking, Position-based visual servo, Unmanned surface vehicles, Neural network},
abstract = {Unmanned Surface Vehicles (USVs) are increasingly applied to water operations such as environmental monitoring and river-map modeling. However, precise autonomous docking at ports or stations remains a significant challenge, often relying on manual control or external positioning systems, which severely limits fully autonomous deployments. In this paper, we propose a novel supervised learning framework featuring an auto-labeling pipeline to enable USVs autonomous visual docking. The primary innovation lies in our automated data collection pipeline, which directly provides paired relative pose data and corresponding images, eliminating the conventional need for manual labeling, such as tagging bounding boxes. We introduce the Neural Dock Pose Estimator (NDPE), capable of accurately predicting the relative dock pose without relying on traditional methods such as handcrafted feature extraction, camera calibration, or peripheral markers. Unlike common bounding-box-based detection algorithms (e.g., Yolo-like methods), our NDPE explicitly predicts the relative pose transformation between the camera frame and USV body frame, significantly simplifying the data annotation and training process. Additionally, the generality of our data collection pipeline allows integration with various neural network architectures, ensuring broad applicability beyond the specific architecture demonstrated here. Experimental validation in real-world water environments demonstrates that NDPE robustly handles variations in docking distances and USV velocities, ensuring accurate and stable autonomous docking performance. The effectiveness and practicality of our approach are clearly verified through extensive experiments. The dataset, tutorial and experimental videos for this project are publicly available at: https://sites.google.com/view/usv-docking/home.}
}