Dynamic control allocation between onboard DAA and Delayed Pilot Command
Control authority allocation between an autonomous system and a remote human pilot in Unmanned Aircraft System (UAS) Platform has added complexity as it requires guaranteed operability in dynamic environment. Since human pilot is not the primary analyzer of the dynamic environment, latency in the communication channel may significantly compromise situational awareness and results in outdated control command. In this context we present our work on command optimization onboard UAS during an encounter. The objectives of this work:
We consider a Detect-and-Avoid (DAA) problem, where the UAS is in an active encounter with a dynamic intruder. The UAS is remotely controlled by a pilot at the Ground Control Station ( GCS) . We assume that the communication between the GCS and the UAS is subject to a constant delay. We approach the problem without redesigning the onboard DAA algorithm. Instead, we employ a dynamic control allocation approach between the pilot and the DAA system to effectively increase pilot’s contribution within safe operational conditions and enhance human-machine user experience.
We propose a Markov Decision Process (MDP) to develop an optimal waiting strategy to determine how long the UAS can wait for the pilot command at each state. We consider that a UAS (ownship) encounters a dynamic intruder in the three dimensional space. The state space of the relative kinematics has six states:
• Relative distance in x dimension,
• Relative distance in y dimension,
• Relative altitude,
• Intruder horizontal speed ,
• Relative vertical velocity, and
• Intruder heading.
The action space has two actions:
The reward function is designed such that it will get positive reward for waiting at the same time will receive high negative reward if the encounter get closer than the well clear threshold. We propagated 1.5 million trajectories and using value iteration algorithm estimated the optimal action at each state.
As we have transition matrix and the optimal action, we propagate every possible path to the threshold violation state and obtain the corresponding collision probability. Assuming that the path is acyclic, the expected waiting time for that state is calculated as the average of the waiting times of all the paths leading to collision weighted by their probabilities. Examples of wait map can be seen in below, the figure in the left illustrates a front view of 3-D wait map, the red zone depicts the collision zone, where the figure in the right side illustrates top view of the map and how intruder heading affect the wait time.
Parameters: Intruder velocity = 110 m/s ; intruder heading =0
Parameters: Intruder velocity = 255 m/ , intruder heading =+5
To incorporate the waiting time map, a wait time manager is developed and integrated in the simulator. The wait time manager incorporates the waiting time map during an encounter and initializes a timer to wait for the pilot command. It is activated by default when the operation mode is DAA-Pilot with the provision of manually turning it off. The command blending is coupled with wait manager and can only be utilized when the wait manager is turned on. The wait time manager loads the waiting time maps and matches the current state with a state in the wait map. If the pilot command is delayed and the wait time at the current state is greater than the anticipated communication latency, the UAS waits. Otherwise the DAA overrides the delayed pilot command to resolve the encounter. However, if a delayed pilot command is received, the command blending algorithm discussed in [1, Section III.C ] is invoked to decide the maneuver options and to execute a more pilot-like maneuver. We compare two different setup:
• Baseline setup with the DAA-Pilot mode and the wait manager turned off
• Integrated setup with the DAA-Pilot mode and the wait manager turned on.
To examine the performance of our proposed algorithms, we consider the following questions:
The plot at left side shows the status of Pilot Command in Encounters. Blue bar represents the baseline setup and orange bar represents the integrated setup. The number of pilot command reception is increased with the integrated setup.
The trajectory deviations from the two setup for the same scenario. In this scenario, the DAA override trajectory is 4.36 times more deviated than the delayed piloted trajectory. However, alternate example is also plausible.
We are currently working on large scale simulation with constant and irregular delay as well as extending command-blending algorithm.
[1] Tabassum, Asma, He Bai, and Craig Kleski. "Optimizing Unmanned Aircraft System Decision Making for Detect-and-Avoid with Delayed Pilot Commands." AIAA AVIATION 2020 FORUM. 2020.