Abstract
Autonomous aerial and aquatic robots that attain mobility by perturbing their medium, such as multicopters and torpedoes, produce wake effects that act as disturbances for adjacent robots. Data-driven approaches using neural networks typically learn a memory-less function that maps the current states of two robots to a force. Such models often fail in agile scenarios: since the wake has a finite propagation time, the disturbance observed now is a function of relative states in the past. We present an empirical study of seven temporal models across four fluid-interaction domains and validate on a custom monocopter gantry with real-world data. The conclusion: history of previous states and transport delay prediction substantially improve wake-effect prediction.
Robots that fly or swim produce fluid wake effects — such as downwash or trailing vortices — that disturb nearby robots. Because the wake propagates at a finite speed, the force felt now was actually caused by the source robot's state at some point in the past. A memory-less model sees only the current state and cannot correlate it with the delayed disturbance. In agile scenarios where the source moves quickly and the delay is significant, this mismatch causes large prediction errors. Temporal context is essential.
High-fidelity CFD of two P600 quadrotors at 0.5–4.0 m/s — well beyond the low-speed regimes of prior work. 30 min of simulated flight at 200 Hz.
Experimental robotic airfoil and compliant flag interacting via vortex shedding. Transfer entropy identifies a causal delay of ~0.33 s. Disturbances are nearly sinusoidal.
Numerical model of ship-to-ship interaction forces in restricted channels. Time-varying lateral targets create a temporal ambiguity only resolvable with history.
Custom simulator with Eulerian-grid velocity field. Emergent transport delay of ~0.38 s. Enables closed-loop evaluation: no compensation, oracle, and each trained model.
• Agile MLP (18K params) — Maps the instantaneous relative state directly to a force prediction. Primary baseline.
• History MLP (49K) — Concatenates a sliding window of past snapshots into one flattened input vector.
• Delay Embedding (35K) — Learns to pick the specific past snapshot that caused the current wake effect using a Gaussian attention kernel.
• GRU (71K) — Processes observations sequentially, maintaining a compressed history summary in a hidden state.
• TCN (73K) — Causal dilated 1D convolutions build a wide receptive field across multiple timescales.
• Mamba (9K) — Selective state-space model with input-dependent recurrence parameters.
• RC / ESN (2K trainable + 1,008K frozen) — Fixed random reservoir; only the linear readout is trained.
• Cross-Attention (16K) — The sufferer selectively queries relevant moments from the wake source's past trajectory via multi-head attention.
CFD Quadrotors — Cross-Attention and Delay Embedding lead, confirming that explicit transport-delay prediction helps in agile settings.
Fish Schooling — Recurrent models excel at recognizing repeating sinusoidal patterns over time.
2D Drone (Closed-Loop) — Cross-Attention and Delay Embedding again rank in the top 3, consistent with the CFD results.
Ship Encounter — History MLP performs best. Relative distance and heading act as implicit phase indicators, reducing the benefit of more complex mechanisms.
On fish schooling, the baseline MLP achieves a small RMSE but only R² = 0.57, with two distinct correlation clusters. The sinusoidal pitching means the same foil angle occurs twice per cycle — once ascending, once descending — producing different wake structures. Without history, the MLP cannot distinguish these phases.
The GRU resolves this ambiguity, achieving R² = 0.96 with predictions evenly spread about the y = x line — indicating unbiased modeling of the wake effect.
The models can learn the physical transport delay directly from data. The Cross-Attention model's attention weights peak at ~0.5 s in the past — near the true physical delay of 0.38 s — with negligible weight on more recent or distant time steps. The Delay Embedding model learns a delay that tracks the true physical delay linearly as vertical separation changes. Both models independently discover the correct temporal structure of the wake.
We train all models at nominal physics parameters and evaluate under three increasingly perturbed conditions: ±10%, ±50%, and ±75% perturbation of wake speed, diffusivity, and vertical separation. The relative ranking of models is preserved across all perturbation levels. Memory-based models do not merely benefit from greater capacity — they genuinely capture the underlying wake dynamics, maintaining their advantage even under significant distribution shift.
We validate on real hardware using a custom rectilinear 2D gantry that tethers two monocopters for constrained XZ-plane motion. A Raspberry Pi 5 and Arduino UNO handle feedback control at 32 Hz, with ~18 ms round-trip latency. During each episode, the wake source's lateral velocity and RPM vary continuously, creating a rapidly changing downwash field. Commands sometimes change faster than the motor spool rate, making instantaneous RPM a function of recent PWM history — compounding the transport delay and further increasing the temporal context needed for accurate prediction.
KEY TAKEAWAYS
1. Temporal context is essential. The predictor needs an explicit representation of the history of previous states as input. Memory-less models fail in agile regimes where the wake transport delay is significant.
2. Transport-delay prediction helps. Models that explicitly estimate when the current disturbance was generated — such as Cross-Attention and Delay Embedding — excel in domains with identifiable delays.
3. The best mechanism depends on the physics. Recurrent models (GRU) suit periodic systems; explicit-delay models suit agile, variable-delay settings; History MLP is a strong practical middle ground for resource-constrained platforms.
4. Future work. Full closed-loop controller integration on real hardware and extension to multi-agent (>2) wake interactions for swarm autonomy.