Autonomous Target Tracking (ATT), often referred to as Active Track, Motion Track, or Dynamic Track, enables autonomous systems, such as drones, to follow selected targets while maintaining a stable distance. Drones have become a prominent platform for ATT due to their versatility, supporting applications like security surveillance, border control, law enforcement, and entertainment. Real-world examples include the U.S. Customs and Border Protection’s use of drones for border surveillance. However, this technology also introduces significant security, privacy, and safety risks, particularly when exploited for criminal purposes, such as stalking or deploying explosives.
Given these risks, the security of ATT systems is critical. Our work found that ATT systems can be fundamentally vulnerable to a newly-discovered “Distance-Pulling Attack” (DPA), where drones running the ATT feature can be manipulated to dangerously shorten the distance to their tracked targets. DPA can lead to severe consequences, including collisions, physical capture of drones, and susceptibility to a broader range of sensor attacks, as illustrated in Figure 1. Unlike attacks that cause tracking errors, DPA enables attackers to more easily crash/eliminate drones or physically capture them (e.g., for personal/business gains such as by reverse-engineering their functions), which can have severe impacts on critical real-world ATT applications such as security surveillance, border control, and law enforcement. Addressing these vulnerabilities and understanding the implications of DPAs is both imperative and urgent to ensure the security and safety of ATT systems.
The Single-Object Tracking (SOT) module is a core component of our ATT pipeline: it produces the object location estimates used by the drone for navigation. Modern SOT approaches are built with deep learning. The tracker is given a single template image of the target and then searches for that same target in each incoming camera frame. For each frame, the model produces one or more bounding-box proposals (rectangle coordinates plus a confidence score). The tracker chooses the proposal with the highest confidence as the final reported location for that frame.
The distance-control component converts the tracker’s bounding-box outputs into actual flight commands — for example, changes in yaw, roll, altitude, and forward/backward motion. A common real-world approach uses only the 2D bounding box from the camera: the drone adjusts orientation to keep the bounding box centered in the image and moves forward or backward to keep the box at a consistent size. Keeping the box centered controls lateral alignment; keeping the box size stable maintains a roughly constant physical distance to the target.
Because many drone systems rely directly on the bounding box to infer distance, manipulating that box is an effective way to influence the drone’s motion. In particular, if an attacker causes the tracker to consistently report a smaller bounding box, the drone will interpret that as the target moving away and will move closer to compensate. This makes the drone follow at an unsafe, reduced distance. We therefore formulate DPA — a system-level attack objective that intentionally shrinks the tracking bounding box to trick the drone’s distance control loop.
Figure 1: Camera-based Autonomous Tracking Drone
Figure 2: Distance-Pulling Attack (DPA) motivation and consequences
Ethics and motivations: the same technique can be used maliciously (stealing or disabling public-service drones) or defensively (disabling an intrusive, unauthorized drone).
Disrupting the tracker can briefly stop ATT, but humans or operators can usually restore tracking. A more effective and lasting strategy is to target the position-control loop. DPA is more dangerous than temporary tracking loss because it physically forces the drone nearer to the target, enabling outcomes such as:
A1: Physical capture (e.g., with a net gun);
A2: Range-based sensor failures or spoofing;
A3: Collisions that permanently disable the drone.
We target ATT drones operating within about 20 meters — the common consumer range for person-follow modes — because it’s realistic for both tracking and attacker observation. Our attack isn’t limited to 20 m, but we use this range for experiments and practicality. We design attacks under a white-box assumption: the adversary knows the victim’s SOT model, which is feasible by gathering device/feature info, buying the same model, and reverse-engineering it. The attacker may also collect generic tracking videos beforehand; these do not need to match the exact target instance or location used during an attack. Finally, although our method assumes white-box access, it can extend to black-box scenarios via adversarial-transfer techniques — we evaluate this with cross-model transfer tests and direct black-box trials on commercial ATT drones.
We introduce the adversarial umbrella: a printed pattern on an umbrella used as a physical attack surface. Umbrellas are convenient for this purpose because they provide a large, flat area for high-quality prints, are easy to carry and deploy outdoors, and let the attacker selectively expose or hide their body. Deployment is simple: the attacker holds the umbrella toward the drone (standing, crouching, or fully hiding as needed). The umbrella is a practical delivery mechanism that enables our system-level attack in realistic settings.
To keep the attack effective as the drone moves, FlyTrap models how the adversarial pattern appears at different distances and viewing angles. We simulate camera geometry and rendering effects and optimize the pattern so its perceived bounding box systematically shrinks as the drone approaches. This progressive modeling preserves the attack’s steering effect throughout the approach, solving the closed-loop challenge of controller feedback.
Many defenses check for consistency across frames or across spatial cues. FlyTrap counters those by generating adversarial targets that explicitly control spatial and temporal features — for example, box shape, keypoint cues, or posehot-like signals — across consecutive frames. By enforcing coherent, human-like appearance and motion in the adversarial region, FlyTrap maintains its effect across time and makes it harder for consistency-based detectors to block the pattern.
Figure 3: FlyTrap optimization pipeline
We built an aerial-view dataset with recordings of four people across typical outdoor settings (two grass fields, two parking lots, bare ground, and a drivable road). For each scene, we captured a training and a testing video. In total, the dataset contains 23 training videos (~11.9k frames) and 25 test videos (~13.6k frames). Ethical considerations are discussed elsewhere in the paper.
Figure 4: Dataset collection
We measure system-level impact with two success rates:
Open-loop success (ASR_open): frame-level measure that counts cases where the tracker reports a confident bounding box that is sufficiently smaller and fully contained within the umbrella region — indicating the pattern would pull the perceived distance in a single frame. We aggregate ASR_open across a sweep of shrinkage and confidence thresholds to get a mean ASR_open.
Closed-loop success (ASR_closed): flight-level measure — the fraction of real drone flights where the drone is pulled within a target distance (for example, a capture or sensor-attack range). We report ASR_closed as the average success rate across multiple real flights and relevant distance thresholds.
TGT achieved about 36% mean open-loop success, while FlyTrap (with PDP) reached ~54%, showing a substantial improvement from our progressive distance-pulling design.
PDP matters: The progressive distance-pulling (PDP) component further improves shrinking performance — the effect seen in simulations also appears in physical tests.
Robustness to distractions: FlyTrap remains effective even when other similar, unobstructed people or objects appear in the scene.
TGT has poor generalization (location universality ≈ 27%) and performs badly when the person or background changes.
FlyTrap generalizes much better (overall universality ≈ 62%) and still outperforms TGT even when both person and location are unseen (about 21 percentage points higher than TGT).
Figure 5: Target photo baseline (TGT)
Table 1: Attack Effectiveness Evaluation
Table 2: Attack Scenario Universality Evaluation
Video 1: We evaluate FlyTrap w/ ATG against PercepGuard defense. LSTM predicts "person" across frames, no alarm is triggered
Video 2: We evaluate FlyTrap w/o ATG against PercepGuard defense. LSTM predicts "car" across frames, triggering the alarm
We evaluate FlyTrap with attack target generation (ATG) against PercepGuard, a spatial-temporal defense designed to secure autonomous vehicles. PercepGuard was originally designed for defending against object detection misclassification attacks. We adapt it to person tracking. If the output of the LSTM prediction is "person", we regard it as no alarm, while if the LSTM predicts other classes, we regard it as alarm.
We present qualitative results by showing the FlyTrap attack video demonstrations in addition to Table V. With our attack target generation (ATG) design, we can achieve spatial consistency (i.e., overlapping tracking and detection prediction) and temporal consistency (i.e., consistent human pose).
Video 3: We evaluate FlyTrap w/ ATG against VOGUES defense. Our attack can be consistent across the single-object tracker (shown in red box), object detector (shown in blue box), and pose estimator (shown in human joints).
Video 4: We evaluate FlyTrap w/o ATG against VOGUES defense. Without spatial-temporal constraints, no human is detected in the single-object tracker prediction area (shown in red box), thus triggering the alarm.
Video 5: FlyTrap w/o PDP against MixFormer with a distance of 2 meters
Video 6: FlyTrap w/o PDP against MixFormer with a distance of 8 meters
Video 7: FlyTrap with PDP against MixFormer with a distance of 2 meters
Video 8: FlyTrap with PDP against MixFormer with a distance of 8 meters
Video 9: Our self-implemented ATT drones
Video 10: FlyTrap w/o PDP closed-loop attack white-box ATT drones with MixFormer models
Video 11: FlyTrap w/ PDP closed-loop attack white-box ATT drones with SiamRPN-ResNet models (First-Person View)
DJI Mini 4 Pro: We provide the following video demonstrations in addition to Figures 14 and 15 and Table VIII in the paper. The DJI Mini 4 Pro drone might implement ActiveTrackTrackingState (e.g., AIRCRAFT_TOO_LOW) to prevent the drone from directly crashing into the object, thus causing the final untracking and hovering behavior. Nonetheless, the distance between the drone and the attacker is largely shortened, thereby still justifying our DPA consequences.
Video 12: first-person view (FPV) from remote controller screenshot
Video 15: drone capturing attack using net gun
Video 13: first-person view (FPV) from remote controller screenshot
Video 16: drone with a broken arm after a netgun shooting
Video 14: third-person view (TPV) from the observer
Video 17: maximum tracking distance. The drone automatically gets closer if the initial distance is too far
Video 18: FlyTrap attack against the DJI Mini 4 Pro under its maximum tracking distance (we empirically find this distance shown in Video 12).
We also study the maximum tracking distance of the DJI Mini 4 Pro and the maximum FlyTrap working distance. We empirically find out that to improve the tracking quality, the DJI drone internally sets a maximum tracking distance of the target to ensure a high-resolution image of the target objects. In our environment, we find that the maximum distance is around 20 meters, as the drone will automatically move closer if the distance exceeds that range. Based on the observation, we test FlyTrap against the DJI Mini 4 Pro at a distance of 20 meters and find that it can still work.
DJI Neo: We provide the following video demonstrations in addtion to Table VIII.
Video 19: FlyTrap attack to DJI NEO
Video 20: Normal umbrella for comparison
HoverAir-X1: We provide the following video demonstrations in addtion to Figure 16 and Table VIII.
Video 21: FlyTrap attack to HoverAir (side view)
Video 22: FlyTrap attack to HoverAir (first-point view)
Video 23: FlyTrap attack to HoverAir
Video 24: Normal umbrella for comparison
We conduct a user study to investigate the stealthiness of FlyTrap. The survey can be visible at: PDF and the results are shown below.
[NDSS'26] FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems
Shaoyuan Xie, Mohamad Habib Fakih, Junchi Lu, Fayzah Alshammari, Ningfei Wang, Takami Sato, Halima Bouzidi, Mohammad Abdullah Al Faruque, Qi Alfred Chen
ISOC Network and Distributed System Security (NDSS) Symposium, 2026. (Acceptance rate TBA)
[PDF] [arXiv] [Code]
BibTex for citation:
@inproceedings{wang2025revisiting,
title={{FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems}},
author={Xie, Shaoyuan and Fakih, Mohamad Habib and Lu, Junchi and Alshammari, Fayzah and Wang, Ningfei and Sato, Takami and Bouzidi, Halima and Al Faruque, Mohammad Abdullah and Chen, Qi Alfred
},
booktitle={ISOC Network and Distributed System Security Symposium (NDSS)},
year={2026}
}
Shaoyuan Xie, Ph.D. student, University of California, Irvine
Mohamad Habib Fakih, Ph.D. student, University of California, Irvine
Junchi Lu, Ph.D. student, University of California, Irvine
Fayzah Alshammari, Ph.D. student, University of California, Irvine
Ningfei Wang, Ph.D. student, University of California, Irvine
Takami Sato, Ph.D. student, University of California, Irvine
Halima Bouzidi, Post-doc, University of California, Irvine
Mohammad Abdullah Al Faruque, Professor, University of California, Irvine
Qi Alfred Chen, Assistant Professor, University of California, Irvine
This research was supported by:
NSF under grants CNS-2145493;
NASA University Leadership Initiative under Award 80NSSC24M0070