We are working to make autonomous vehicles more autonomous and adaptable using state of the art machine-learning algorithms.

A brief history

Many algorithms have been designed for AUV path planning (and obstacle avoidance) over the last decades. Most of these studies have focused on path planning under the influence of marine currents, obstacle avoidance, or seabed coverage. The optimization objectives for AUV path planning are usually the path length, time, and power consumption. However, path planning can also be used to optimize for underwater target localization.

One of the main challenges in marine research lies in the underwater positioning of underwater features or assets (e.g., underwater vehicles). Due to the large attenuation of radio waves in water, Global Positioning System (GPS) signals are not suitable for positioning underwater targets. Nonetheless, acoustic signals can fill the underwater communications gap left by radio waves. Recently, range-only and single-beacon methods, where a single AUV surveys a marine area to estimate the position of an acoustically tagged target, have been applied successfully. Range-only methods have different advantages over angle-related localization methods (e.g., Ultra-Short Baseline (USBL) systems) because they (i) reduce the power consumption and the number of required devices (e.g., an inertial measurement unit), and subsequently the cost and size of the overall system, and (ii) angle measurements are less robust in rough sea conditions compared to range measurements, especially if they are used in small platforms such as autonomous surface vehicles (ASVs) or AUVs.

The main drawback in range-only and single-beacon localization techniques is related to path optimization (i.e., what trajectory should follow the AUV to increase the accuracy of the predicted target position). While for static targets, the optimization solution is relatively straightforward, in a dynamic environment with a mobile target, an analytical solution is not trivial. In the present work, a deep reinforcement learning (RL) approach is used to find the optimal policy that an agent (e.g., an AUV or ASV) should follow in order to accurately localize underwater targets.

Wave Glider ASV tracking an underwater target

Where we are today

Preliminary results show that the RL agent can learn a policy with a performance comparable to the analytically derived optimal trajectory. This approach highlights the potential for tracking marine animals by autonomous underwater vehicles, and could enable coordinated fleets of vehicles to localize and track a set of underwater assets via multi-agent, multi-target approaches that are currently intractable with existing methodologies. The RL agent has been tested in a realistic environment using the AUV Sparus II, from Iqua Robotics (iquarobotics.com), running its control architecture COLA2, which is based on the Robot Operating System (ROS). The results indicate a reduction of ~10% in the time required for the AUV to localize the target with an error reduction of 30%, compared to a more traditional method using an analytical solution.


Single agent (blue dot) localizing a target (black dot)
Two agents (blue dots) coordinated to localize a target (black dot)