Gusang Lee¹, Won Joon Yun¹, Soyi Jung², Joongheon Kim¹, Jae-Hyun Kim²
Korea University¹, Ajou University²
This demo abstract presents the visualization of deep reinforcement learning (DRL)-based autonomous aerial mobility simulations. In order to implement the software, Unity-RL is used and additional buildings are introduced for urban environment. On top of the implementation, DRL algorithms are used and we confirm it works well in terms of trajectory and 3D visualization.
Introduction
As a major part of beyond 5G (B5G) or 6G network scenarios, autonomous urban aerial mobility (UAM) systems are widely and actively discussed by industry and academia [1]. Based on these huge interests, many research contributions are available nowadays in terms of UAM trajectory optimization [2], energy-efficient operations [3], and so forth.
The trajectory optimization and energy-efficient operations fundamentally control the mobility of autonomous UAM systems.
Therefore, conducting visual simulations with the proposed learning-based trajectory optimization and energy-efficient operation algorithms is essentially required in order to intuitively understand the behaviors of the UAM algorithms.
In addition, the most of trajectory optimization algorithms are designed via deep reinforcement learning (DRL) because DRL algorithms are fundamentally for sequential stochastic decision making in order to maximize cumulative expected rewards. Therefore, the UAM simulations should efficiently identify the DRL-based autonomous UAM flying trajectories and operations, and thus, it is obvious that visual representation of the UAM simulations is helpful for intuitive understanding the algorithms.
In this paper, we implement our own 3D visualization software platform which is for simulating DRL-based autonomous trajectory control using Unity. In order to conduct more precise simulations, we added buildings for urban environment because we assume urban aerial mobility for providing smart city services such as surveillance and flexible mobile access.
Unity Implementation and 3D Visualization
Unity Implementation
Fig. 1: Software architecture for learning environment.
Fig. 1 illustrates the system architecture for conducting DRL in Unity environment. With Unity API, it is possible to model the environment, dynamic models and features, DRL elements (i.e., states, actions, transitions, and rewards), and these are named to Asset in Unity.
With mlagents (i.e., Unity library for DRL implementation), training DRL agents and visualizing the training results can be realized because 1) Communicator exists which realizes the interaction with Python API and 2) Asset can be loaded. For the simulations of UAM systems, Unity Asset which is named to Drone Flight is used.
Our considering aerial mobility system is UAM, thus the simulations should be performed in urban areas those are with numerous building and skyscrapers. Therefore, we implements the buildings and skyscrapers in Drone Flight. Furthermore, the corresponding environment information which can be observed by agents is organized by current position, goal position, current velocity, current angular velocity, altitude vector, and building/skyscraper position vectors. The actions are for controlling the UAM motor for desired directions, thus 3D Cartesian coordinate is used, i.e., (x,y,z). Lastly, the rewards can be positive when 1) the agent arrives at the destination whereas the rewards are negative when 1) the agent becomes far from the goal and 2) the agent becomes closer to buildings, skyscrapers, and obstacles.
Fig. 2: Unity implementation in UAM environment.
Fig. 2 shows the Unity implementation results in this UAM environment. Note that several buildings are added for urban scenario construction.
Visualization
Based on our Unity implementation on top of Drone Flight, we conduct DRL-based agent trajectory training and performance evaluation. In addition, the results are visualized.
For the DRL training of the agent, proximal policy optimization (PPO) is used [4]. The policy of agent is trained by the deep neural networks with 2 dense layers where each layer is with 128 units. In addition, ε-greedy is used for DRL training exploration where ε =0.2.
Furthermore, multi-agent parallel processing is utilized that is supported by mlagents, thus parallel accelerated training computation is realized. For the DRL training, learning iteration is set to 3,000K and detailed hardware/software specification is summarized in Table. 1.
Fig.3 : Rewards of autonomous aerial mobility learning.
Fig. 4: Visualization of autonomous aerial mobility learning.
Fig. 3 and Fig. 4 show the simulation results. In Fig. 3, the reward convergence is plotted, and then we can confirm that the reward eventually converges.
Fig. 4 shows the time-series visual simulations of DRL-trained UAM agent's behaviors.
Then, it can be observed that 1) the UAM agent moves toward its own goal when t ∈ [0, T] and 2) the UAM tries to avoid buildings, skyscrapers, and obstacles when t ∈ [0, 2T/8] , as designed in rewards. Note that the behaviors to move to the goal can be observed during all time steps.
Finally, we observe that the DRL reward converges and the corresponding DRL-based agent controls its own trajectory based on the positive and negative reward settings. Therefore, we can confirm that our DRL-based agent works as desired and the results are simulated and visualized via our own Unity-based software visual simulation platforms.
Note that the video demonstration for our own simulation and 3D visualization results are as shown above (Youtube video).
Conclusion and Future Work
This demo abstract presents the implementation and visualization of DRL-based autonomous UAM simulations. Furthermore, various buildings can be placed for smart city urban environment simulations. As future work, various urban scenarios can be also considerable.
Acknowledgment
This research is supported by National Research Foundation of Korea (2019R1A2C4070663 and 2019M3E4A1080391).
S. Jung, J. Kim, and J.-H. Kim are corresponding authors.
Reference
[1]W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,” IEEE Network, vol. 34, no. 3, pp. 134–142, May/June 2020.
[2] S. Yin, S. Zhao, Y. Zhao, and F. R. Yu, “Intelligent trajectory design in UAV-aided communications with reinforcement learning,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 8227–8231, August 2019.
[3] M. Shin, J. Kim, and M. Levorato, “Auction-based charging scheduling with deep learning framework for multi-drone networks,” IEEE Trans. Veh. Technol., vol. 68, no. 5, pp. 4235–4248, May 2019.
[4] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/abs/1707.06347