FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems
The Network and Distributed System Security (NDSS) Symposium 2026
The Network and Distributed System Security (NDSS) Symposium 2026
* For all the demonstrations, the video quality can be adjusted for better visualization.
Autonomous Target Tracking (ATT), often referred to as Active Track, Motion Track, or Dynamic Track, enables autonomous systems, such as drones, to follow selected targets while maintaining a stable distance. Drones have become a prominent platform for ATT due to their versatility, supporting applications like security surveillance, border control, law enforcement, and entertainment. Real-world examples include the U.S. Customs and Border Protection’s use of drones for border surveillance. However, this technology also introduces significant security, privacy, and safety risks, particularly when exploited for criminal purposes, such as stalking or deploying explosives.
Given these risks, the security of ATT systems is critical. Our work found that ATT systems can be fundamentally vulnerable to a newly-discovered “Distance-Pulling Attack” (DPA), where drones running the ATT feature can be manipulated to dangerously shorten the distance to their tracked targets. DPA can lead to severe consequences, including collisions, physical capture of drones, and susceptibility to a broader range of sensor attacks, as illustrated in Figure 1. Unlike attacks that cause tracking errors, DPA enables attackers to more easily crash/eliminate drones or physically capture them (e.g., for personal/business gains such as by reverse-engineering their functions), which can have severe impacts on critical real-world ATT applications such as security surveillance, border control, and law enforcement. Addressing these vulnerabilities and understanding the implications of DPAs is both imperative and urgent to ensure the security and safety of ATT systems.
The dataset we collect includes 4 different individuals and 4 distinctive background locations (e.g., drivable road, bare ground, grass field, parking lot), which have 16 combinations in total. For each individual in each environment, we recorded two videos: one for training and one for testing, at 24 frames per second with a resolution of 1920 x 1080 pixels. Each video ranges from 11 to 37 seconds in duration. The dataset includes 23 training videos comprising 11,898 frames and 25 evaluation videos comprising 13,594 frames.
We present qualitative results by showing the FlyTrap attack video demonstrations in addition to Table IV. With our attack target generation (ATG) design, we can manipulate the bounding box aspect ratio to bypass PercepGuard.
Demo 1: We evaluate FlyTrap w/ ATG against PercepGuard defense. LSTM predicts "person" across frames, no alarm is triggered
Demo 1: We evaluate FlyTrap w/o ATG against PercepGuard defense. LSTM predicts "car" across frames, triggering the alarm
We evaluate FlyTrap with attack target generation (ATG) against PercepGuard, a spatial-temporal defense designed to secure autonomous vehicles. PercepGuard was originally designed for defending against object detection misclassification attacks. We adapt it to person tracking. If the output of the LSTM prediction is "person", we regard it as no alarm, while if the LSTM predicts other classes, we regard it as alarm.
We present qualitative results by showing the FlyTrap attack video demonstrations in addition to Table V. With our attack target generation (ATG) design, we can achieve spatial consistency (i.e., overlapping tracking and detection prediction) and temporal consistency (i.e., consistent human pose).
Demo 1: We evaluate FlyTrap w/ ATG against VOGUES defense. Our attack can be consistent across the single-object tracker (shown in red box), object detector (shown in blue box), and pose estimator (shown in human joints).
Demo 2: We evaluate FlyTrap w/o ATG against VOGUES defense. Without spatial-temporal constraints, no human is detected in the single-object tracker prediction area (shown in red box), thus triggering the alarm.
We evaluate FlyTrap with attack target generation (ATG) against VOGUES, a spatial-temporal defense designed to secure autonomous vehicles. VOGUES was originally designed for defending multiple-object trackers. We follow their setup while making necessary adaptations for single-object trackers. We compute the highest IoU between the single object tracking and the object detection. If the IoU is below a preset threshold (e.g., 0.5), the alarm will be triggered. This can prevent the high false alarm rate when applying to the single object tracking task when other passersby appear.
We also reproduce an LSTM to inspect the consistency of the human pose over time. The IoU and LSTM score are shown on the top right of the demo video.
We present the video demonstration in addition to Figures 9 and 10. Without our progressive distance-pulling (PDP) design, the attacked bounding box locks on a fixed human-shape area on the umbrella at different distances. With our PDP design, the shrink rate is much smaller. At closer distances, the shrink rate is even smaller than that at longer distances.
MixFormer w/o PDP at distance of 8m
MixFormer w/ PDP at distance of 8m
MixFormer w/o PDP at distance of 2m
MixFormer w/ PDP at distance of 2m
SiamRPN w/o PDP at distance of 8m
SiamRPN w/ PDP at distance of 8m
SiamRPN w/o PDP at distance of 2m
SiamRPN w/ PDP at distance of 2m
To evaluate our attack in physical, closed-loop setups, we built a drone with full-stack ATT capabilities.
Physical evaluation setups. We use a MacBook Pro as the ground control station to select targets through a web portal (shown on the right). For safety issues, we let the drone be static and let the experimenter move until the bounding box is the same size as initialization, which is an approximation to autonomous tracking behavior.
Demo: Our implemented full-stack autonomous target tracking drone.
We provide the following video demonstrations in addition to Figures 14 and 15 and Table VIII. The DJI Mini 4 Pro drone might implement ActiveTrackTrackingState (e.g., AIRCRAFT_TOO_LOW) to prevent the drone from directly crashing into the object, thus causing the final untracking and hovering behavior. Nonetheless, the distance between the drone and the attacker is largely shortened, thereby still justifying our DPA consequences.
Demo 1: first-person view (FPV) from remote controller screenshot.
Demo 4: drone capturing attack using net gun.
Demo 2: first-person view (FPV) from remote controller screenshot.
Demo 5: drone with a broken arm after netgun shooting.
Demo 3: third-person view (TPV) from the observer.
Demo 6: maximum tracking distance. the drone automatically gets closer if the initial distance is too far.
Demo 7: FlyTrap attack against the DJI Mini 4 Pro under its maximum tracking distance (we empirically find this distance shown in Demo 6).
We also study the maximum tracking distance of the DJI Mini 4 Pro and the maximum FlyTrap working distance. We empirically find out that to improve the tracking quality, the DJI drone internally sets a maximum tracking distance of the target to ensure a high-resolution image of the target objects. In our environment, we find that the maximum distance is around 20 meters, as the drone will automatically move closer if the distance exceeds that range. Based on the observation, we test FlyTrap against the DJI Mini 4 Pro at a distance of 20 meters and find that it can still work.
We provide the following video demonstrations in addtion to Figure 16 and Table VIII.
Demo 1: FlyTrap attack to HoverAir (side view).
Demo 2: FlyTrap attack to HoverAir (first-point view).
Demo 3: FlyTrap attack to HoverAir.
Demo 4: Normal umbrella for comparison.
We provide the following video demonstrations in addtion to Table VIII.
Demo 1: FlyTrap attack to DJI NEO.
Demo 2: Normal umbrella for comparison.
We conduct a user study to investigate the stealthiness of FlyTrap. The survey can be visible at: PDF and the results are shown below.