As AI controllers show great potential to enhance the functionality and reachability of current applications, AI controllers have been proposed and used in diverse industrial domains. However, there has not been a systematic study on how well AI controllers perform compared with traditional controllers. RQ1 is proposed to compare the two types of controllers with comprehensive evaluation metrics, such that, we can obtain a better understanding and insights about pros and cons of AI and traditional controllers.
In this section, we discuss the results in the performance of each system in detail.
We compare the performances of these controllers according to 5 properties: Hard safety, Soft safety, Steady state, Resilience, and Liveness , which have been introduced in sec 4.1 in our paper. Here, we provide more detailed experimental data as supplements to our paper.
From the table above and the radar chart on the right side, DRL controllers do not outperform the traditional controller (MPC). On metric S1, S3, S4 and S5, these controllers have equally good performance; but on S2: the average error between ego car velocity and target velocity, the traditional controller has a smaller error.
Since all controllers have no violation on the hard safety requirement, we can say the randomly generated inputs cannot distinguish these controllers significantly. The advanced testing method: falsification is required on ACC to explore the quality and reliability of these controllers.
In LKA, the traditional controller and the DDPG based DRL controller are the only two that never violate the hard safety metric. On the aspects of average error and maximum error, DDPG controllers are slightly lower than the traditional controller, but further falsification is needed to truly determine their performance on safety.
PPO, A2C and SAC controllers have much worse than the two above. One reason could be that other agents have different reward functions, DNN structures, and training time, the other reason could be the function approximators in PPO, A2C, and SAC are stochastic representations that output stochastic policy with a specific probability distribution. Thus, the learning process of these agents is not as stable as DDPG is, and may take more time and iterations to achieve the optimal policy.
In APV, all types of controllers have rarely violated the hard safety threshold, especially, the TD3 based DRL controller also shows better performance on stability (S3) and resilience (S4). Regarding the average error and maximum error, the DDPG controller has slightly larger errors on S2, but TD3 is pretty close to the traditional controller.
All these controllers will be passed to the second evaluation step: falsification; so that, we can obtain the true values about their functionality on safety.
From the table above, DRL controllers have equivalent good performance with the traditional controller on hard safety metric, and the TD3 controller also has advantages on maximum error. CSTR is a classic example, that DRL controllers and the traditional controller have distinct strengths respectively thus, we can consider to designing a hybrid control system to combine their advantages.
In LR, most of the DRL agents fail to provide controllers with acceptable performance except DDPG. This can be caused by the complexity of this system since LR requires two output signals from the controller to balance the rocket. Besides the MAE and liveness properties, the DRL controller behaves much poorly than the traditional one.
Also, we do not require stability and resilience metrics in LR, as the task of this system is to land a rocket on target position in a given time. We only consider the final state of the rocket to evaluate the performance, and the progress states are trivial.
AFC is an important and representative system in the powertrain field. From the table above, the DDPG based DRL controller has the best performance among all DRL controllers. The traditional controller and the DPG one have no violation of hard safety requirements during the 100 simulations with randomly sampled inputs.
We notice that the DDPG controller brings significant good results on MAXERR which proves a unique maximum error reduction ability in the DDPG controller. Meanwhile, the traditional controller can better handle the average error in simulations.
DRL controllers have not given comparable results in WT because both DDPG and TD3 controllers violate the hard safety metric a lot of times. Although DRL controllers have similar performance on metrics S2 - S4, but the most critical metric S1 indicates they cannot be applied as reliable control systems.
WT is a special system, as it has much more requirements on safety compared to others, it takes the torque, blade angle, rotation speed, and blade angle response time into account to evaluate the safety. Thus, the DRL controllers struggle to balance all requirements simultaneously, and we consider that a standalone DRL controller may not be enough to handle the systems with lots of requirements at the same time. It might be a good idea to design multiple or distributed DRLcontrollers to process these tasks separately.
DRL controllers have outperformed the traditional controllers in all metrics in SC, where DRL controllers never violate the hard safety threshold with smaller values on MAE, MXERR, and better stability.
From observing the outputs from the traditional PID controller in SC, we find that the PID controller behaves properly most of the time; except, it has some instant overshoots in a few time steps. We consider this phenomenon reflects the instability of the PID controller that while a special input comes, the PID controller may have abnormal outputs that lead to fluctuations in errors.
WTK is another system that the DRL controllers completely outperform the traditional controllers. We consider the bad results in the traditional PID controllers are due to the same reason as SC that the instant overshoots.
We also notice that different DRL controllers have advantages on different metrics. The DDPG controller has better results on MAE but less stability compared with the TD3 controller.