In this section, we introduce two parts of the ensemble framework: sub-controllers, which provide the fundamental control logic of the system, and the ensemble strategy, which decides how to combine the output of the sub-controllers.
For sub-controllers, in order to choose the optimal decision in different scenarios, we aim to train them towards diverse decision logic and cover different requirements. Towards this goal, we mainly focus on the reward function setting and diverse DRL algorithm selection.
Ensemble Strategy
As the cornerstones of the ensemble framework, constituent DRL sub-controllers output a set of candidate actions following their own policies. The next challenge is how to merge these actions in order to deliver a synergistic impact on the required task. In SIEGE, Ensemble strategy, which is based on the constructed semantics-based abstract model, aims to provide the control action fusion so that the ensemble controller can outperform the individual controllers. In particular, we propose six ensemble strategies to explore the potential capabilities of ensemble control systems in AIenabled CPS, and they are of three categories:
Classic strategy: Majority Voting and Average.
Semantics-based strategy: Top-1 Semantics Prediction and Top-k Semantics Prediction
Coordinator-based strategy: Coordinator and Coordinator (with prediction)
Classic Strategy
Majority voting: the final control output is the mean value of all concrete actions that belong to the abstract action with maximum votes.
Average strategy: this method takes the average value among actions from all sub-controllers as the final output:
Semantics-based Strategy
To tackle the multi-requirement control challenge, we leverage the semantics information on the semantics-based abstract model.
Top-1 Semantics Prediction: We rank the action by the sum of the semantics of incoming abstract states and choose the top-1 action in the rank as the output. An action is neglected in the ranking if a negative number exists in the predicted semantics, as a negative value indicates a violation of a specification.
Top-k Semantics Prediction: Similar to the above strategy, we also rank the action. However, we take the mean of the top-k action as the output action.
Coordinator strategy
Besides the deterministic strategies introduced above, we design stochastic strategies, which are able to explore other potential optimal actions and perform an adaptive selection. Particularly, inspired by the exploration and exploitation principle of reinforcement learning, we utilize a ”high-level” DRL agent, i.e., coordinator, for the action selection.
Coordinator. This strategy contains a DRL agent to perform an assessment and pick the optimal action for the task. Given the system state as input, the task of the coordinator is to select an optimal action from the sub-controllers.
Coordinator (with prediction). The second coordinator strategy takes additional information on the predicted semantics. The predicted semantics, from the semantics-based abstract model, aims to enhance the collaboration ability of the higher-level DRL coordinator.
Ensemble On AI-CPS
"A hierarchical architecture in reinforcement learning has demonstrated its promising abilities to decompose a long-term decision-marking task into simpler sub-tasks. Therefore, besides the single-layer ensemble methods introduced in classic strategies and semantics-based strategies, we further propose a two-layer hierarchical ensemble 7 structure to extensively explore the capability of hierarchical ensemble control in the context of AI-CPSs. Namely, instead of using rule-based deterministic strategies to aggregate the control outputs from constituent DRL subcontrollers, we design stochastic strategies, which are able to explore other potential optimal actions and perform an adaptive selection.
Particularly, inspired by the exploration and exploitation principle of reinforcement learning, we utilize a ”high-level” DRL agent, i.e., coordinator, for the action selection. In contrast to the constituent DRL controllers in the first layer, a new DRL agent, the coordinator, is trained to select the proper action from the first layer controllers. Namely, the DRL coordinator does not output concrete control signals for the system actuator but selects the output from one of the DRL sub-controllers as the final output. The reward function for the coordinator is slightly different from the sub-controllers. All specifications have equal weights in the reward function, as we consider these safety-related specifications to be equivalently important during the practical operation. In addition, we set a huge penalty for the violation of any specifications to make the coordinator select the optimal action to prevent any violations. Note that we have tried this reward structure on the DRL sub-controllers but failed to obtain an optimal controller which can tackle the multi-objective optimization regarding different system specifications. This is one of the motivations for leveraging ensemble methods in AI-CPSs, as we have mentioned in Section 1.
We propose two comparison sets between: 1) individual controllers and traditional ensemble methods, and 2) newly proposed ensemble methods and the classical ones. Also, we apply multiple major and minor specifications to obtain a detailed and comprehensive evaluation for each controller. From the enriched evaluation results, we discover that the ensemble controllers can deliver better performance than individual controllers, the semantics prediction can reinforce the ensemble strategies, and finally, the Coordinator methods can outperform any others in each system in terms of major specifications.
We consider that the semantics predictions from the abstract model may have a large error when encountering certain corner state-action pairs. The maximum abstraction error of the predicted semantics is much higher than the average error. The rule-based semantics-guided ensemble strategies, namely, Top-1 and Top-k, may produce a sub-optimal control signal. To mitigate such drawbacks, we can either set tighter thresholds on abstract models to maximally reduce the number of these corner samples or craft advanced semantics-guided ensemble strategies with extra attention on special system conditions. However, we notice that Coordinator (hierarchical) methods can aware of such special system behavior during training. We consider the coordinator method can compensate some weak spots from the abstract model and produce the ensemble output more adaptively