ECUCPS

Deep Reinforcement Learning Enabled Persistent Surveillance with Energy-Aware UAV-UGV Systems for Disaster Management Applications

Md Safwan Mondal* , Subramanian Ramasamy, Pranav Bhounsule

University of Illinois Chicago

Paper | Video | Code

Illustration

An illustration of cooperation between an energy-limited UAV and a UGV for surveying areas impacted by disasters. The UAV conducts continuous surveillance, periodically recharging via the UGV. The proposed DRL policy manages mission point visits and coordinates UAV-UGV recharging efforts.

Problem overview

This project tackles the challenge of UAV-UGV collaboration for disaster management, with a focus on persistent surveillance. The goal is to continuously monitor a disaster-affected area, collecting real-time data crucial for operations like damage assessment, search and rescue, and the delivery of emergency supplies. The system involves two vehicles: a UAV (unmanned aerial vehicle) and a UGV (unmanned ground vehicle), tasked with visiting a series of mission points.

These mission points can be categorized as either:

Ground points (reachable by both UAV and UGV via road),
Aerial points (accessible only by the UAV).

The UAV is limited by its battery, while the UGV, slower but with greater endurance, serves as a mobile recharging station for the UAV. The key challenge is to coordinate the UAV's fast but limited range with the UGV's slower, longer-lasting operation to ensure all mission points are regularly visited. The problem aims to minimize the time between successive visits to each point, ensuring the most recent data is collected. This requires optimizing both the routes and the recharging schedule for the vehicles.

The goal is to design an efficient strategy to balance the UAV and UGV's strengths for minimizing the surveillance "age period" at all mission points during the disaster management mission

Problem modelling: Bilevel optimization approach

The bilevel optimization framework is used to solve the UAV-UGV cooperative routing problem by breaking it into two hierarchical levels:

Outer Level (UGV Routing):
The UGV's route is optimized first, solving a traveling salesman problem (TSP) over the road network to minimize travel time. The UGV's route serves as a mobile recharging station for the UAV, and potential refuel stops are identified along the UGV's path based on the UAV's flight time and battery limits.
Inner Level (UAV Routing):
Using the available refuel stops from the UGV’s route, the UAV’s path is planned by modeling open-ended energy-constrained vehicle routing problem with time window constraints (O-EVRPTW) to minimize the time between consecutive visits (age periods) to mission points. The UAV's route is constrained by energy limitations, refueling requirements, and the need to prioritize mission points that haven’t been visited recently. The UAV coordinates with the UGV at predefined rendezvous points to recharge during the mission.

Bilevel optimization workflow: a) Initial scenario showing UAV and ground points, with the UGV’s travel path along the road network from its starting depot, guided by the TSP solution. b) Potential refueling stops offered by the UGV during the UAV’s O-EVRPTW route planning. c) Coordinated recharging meetups between the UAV and UGV, along with their respective route segments, obtained from the O-EVRPTW solution.

To traditionally solve the UAV-UGV cooperative routing problem, we implement bilevel optimization in a receding horizon fashion, where the planning process iterates through each phase until the mission horizon is completed. In this approach, the UGV's and UAV's routes are continuously updated to optimize recharging and mission point visitation.

As a benchmark, we implement a non-learning approach using the Google OR-Tools™ CP-SAT solver, combined with three metaheuristics—Guided Local Search (GLS), Tabu Search (TS), and Simulated Annealing (SA)—for the bilevel optimization solution. These methods offer efficient, approximate solutions and serve as a baseline for comparison with our Deep Reinforcement Learning (DRL) approach, which dynamically adapts to mission environments and demonstrates superior performance in both solution quality and runtime efficiency.

Deep Reinforcement Learning framework

Our Deep Reinforcement Learning (DRL) framework solves the UAV-UGV cooperative persistent surveillance problem by leveraging an encoder-decoder-based transformer network. The DRL algorithm learns an optimal routing policy for the UAV, determining mission point visits and recharging events with the UGV. The transformer architecture enables efficient sequential data processing, converting mission point coordinates into high-dimensional embeddings for decision-making. This approach allows the UAV to continuously monitor mission points while coordinating with the UGV for recharging, ensuring extended mission performance. The result is an optimized, persistent surveillance route that balances energy consumption and task completion.

The architecture of the proposed Transformer network features an encoder with three attention layers that create input embeddings from raw data. Meanwhile, the decoder builds a context vector reflecting the current state. This network utilizes both the input embeddings and context vector, which pass through multi-head and single-head attention layers to select the next action. Sequentially, this process establishes the cooperative route for ongoing surveillance, as illustrated.

Results

Training:

The Deep Reinforcement Learning (DRL) framework is evaluated on multiple scenarios with varying numbers of UAV and UGV mission points. The training plot compares the reward convergence during training for the DRL model and an Attention Model (AM) baseline across different problem sizes.

It shows that the DRL model consistently outperforms the AM across all configurations. For example, in the U45G15 scenario, DRL achieves a higher reward, converging more quickly than AM. This demonstrates the effectiveness of the DRL framework in learning optimal routing and recharging policies in UAV-UGV cooperative missions. The model's ability to adapt to different problem sizes further highlights its scalability and generalization capability.

Comparison Analysis

Table I presents the comparison of different methodologies across three problem sizes, highlighting average objective values, runtimes, and optimality gaps. The proposed DRL policy consistently outperforms both learning-based and heuristic approaches by producing lower objective values and achieving better solution quality, especially in larger problem scenarios. The DRL(10240) model shows the best overall performance, with minimal objective values and competitive runtimes.

Among learning-based models, the DRL policy significantly surpasses the Attention Model (AM), with the optimality gap increasing as problem size grows. The greedy decoding strategy yields the fastest runtimes, while the sampling strategy offers higher-quality solutions at the cost of increased computation time. Among heuristic methods, Tabu Search (TS) performs comparably to the DRL policy but with considerably longer runtimes.

In summary, the DRL(greedy) model achieves the best balance between solution quality and runtime efficiency across all problem sizes. Animations of the routes obtained from different methods can be viewed on our website.

Generalization

Larger Problem Sizes

To evaluate the generalization capability of our DRL framework, we tested it on larger problem instances beyond the original training size. We created two larger scenarios: U60G20 (60 UAV points, 20 ground points) and U75G25 (75 UAV points, 25 ground points). Our DRL policy outperformed baseline models, including the Attention Model (AM) and heuristic methods like Tabu Search (TS). The DRL(10240) policy achieved the lowest objective scores and demonstrated superior solution quality across all test instances, with a minimal optimality gap compared to AM and TS.

While the runtime increased with problem size, the DRL models produced faster solutions than heuristic methods, with the DRL(greedy) strategy emerging as the fastest, significantly outperforming heuristics in computation time. This shows that the DRL framework generalizes well to larger and more complex problem scenarios, maintaining both efficiency and solution quality.

Mission Points Distribution Variations

We tested the generalization ability of our DRL framework across 30 scenarios with three different UAV mission point distributions: Gaussian, Rayleigh, and Exponential. These distributions mimic diverse real-world operational environments, such as clustered surveillance areas (Gaussian), buffer zones (Rayleigh), and sharp risk declines (Exponential).

The DRL(10240) policy consistently outperformed baseline models, achieving the lowest objective scores across all distributions. The Tabu Search (TS) heuristic model provided comparable performance to the DRL(greedy) model but took significantly longer to compute. The Exponential distribution yielded the lowest objective scores due to the clustered nature of mission points, while the Rayleigh distribution produced the highest scores due to the wider radial spread.

The DRL(greedy) model demonstrated a strong balance between solution quality and runtime, outperforming the TS heuristic by producing solutions up to 200 times faster in small problem sizes and 65 times faster in larger ones.

Instances of scenarios with varied distributions of UAV points positioned around the road network

Case study: Hurricane Harvey 2017, Texas, USA

Case study scenario:

To evaluate the real-world impact of our UAV-UGV collaborative routing and recharging framework for persistent surveillance, we applied our model to a scenario based on the Hurricane Harvey disaster in 2017. Hurricane Harvey was one of the most catastrophic storms in U.S. history, causing severe flooding in Texas and over $125 billion in damages. Using data from the ArcGIS disaster map and USA population density map, we designed a grid-based surveillance system focusing on highly populated areas in Houston, Texas.

The area was divided into square grids, each covering a sensor footprint of 1600 square meters. We identified 20 UAV-only mission points and 10 ground points as surveillance targets within flood hazard zones. These targets were monitored over a 1000-minute period, providing insights into how UAV-UGV cooperation can assist in disaster management and persistent monitoring in risk-prone areas.

ArcGIS disaster map of Hurricane Harvey 2017

ArcGIS US population density map

Grid map based on sensor footprint

Constructed case study scenario

Dynamic planning

Our case study explores dynamic planning in persistent surveillance when new mission points emerge randomly during the routing process. Using our trained DRL(greedy) policy, we dynamically update the UAV’s route while considering online changes, such as the appearance of new mission points that are only accessible to the UAV. The UAV-UGV exchange information exclusively during rendezvous points, where the DRL model updates its encoder space to incorporate the newly added mission points.

During a 1000-minute mission, with random mission points appearing within the first 200 minutes, the DRL(greedy) method proved highly effective in dynamically adjusting routes in real-time due to its fast execution. The results showed minimal deviations in route score and maximum age periods, confirming the robustness of the DRL policy for dynamic surveillance scenarios.

This demonstrates the model’s ability to adapt in real-time, ensuring effective coverage of both initial and newly added mission points.

Priority Driven Persistent Surveillance

We extended our case study to showcase how the DRL policy adapts to priority-driven surveillance, where mission points are assigned different weights based on their required visit frequency. High-priority mission points, which require more frequent visits, are given a higher weight, while low-priority points are assigned a lower weight.

By modifying the age period status of mission points based on their weights, the DRL policy ensures that high-priority points are visited more frequently. The policy adjusts dynamically by incorporating an increment factor that increases the urgency for high-priority points. After optimizing the hyperparameters, the results demonstrate a significant reduction in the maximum age periods for high-priority mission points, while low-priority points experience an increase.

This approach is especially valuable in real-world disaster management scenarios where certain critical areas need more frequent monitoring than others, effectively ensuring optimal allocation of resources.

Objective score metric variation in the prioritized persistent surveillance scenario relative to the hyperparameter S

In the case study scenario, maximum age periods at individual mission points are compared between uniformly weighted and priority-weighted surveillance conditions. With priority-driven surveillance, high-priority mission points exhibit reduced maximum age periods, whereas low-priority points see an increase in age period values.

Conclusion

Our study introduces a DRL-based framework for energy-aware UAV-UGV cooperative surveillance, optimized for disaster response scenarios. Utilizing a transformer network, our approach ensures efficient mission point coverage and recharging coordination, outperforming traditional methods in both accuracy and runtime. Successfully tested on simulated disaster scenarios like Texas Hurricane Harvey, our model adapts to dynamic conditions and priority-weighted mission points, offering a versatile, real-time solution for persistent surveillance in disaster management.

Page updated

Google Sites

Report abuse