Replication Package

On this page, we summarize the necessary steps for replicating the experimental results in our paper. We have made our replication package publicly available at GitHub repository , as a reference for future research.

The Structure of the Repository

The directory AI-CPS-Benchmark/ contains a benchmarks/ sub-directory and a tools/. The former stores the 9 CPS models introduced in Benchmarks, and the latter involves two falsification tools, namely Breach and S-TaLiRo, used for RQ2: Falsification .

    • The directory benchmarks/ has 9 sub-directories, each storing one CPS model introduced in Benchmarks. Each model directory has a similar structure, and we take the directory ACC/ as an example of illustration.

ACC/ involves the following sub-directories:

      • traditional/ stores the original ACC Simulink model with a traditional controller.

      • DRL/ has three sub-directories: train/, model/, agent/. The folder train/ stores the scripts for training DRL controllers, model/ stores the Simulink models with trained DRL controllers, and agent/ stores the trained agents (DRL controllers) based on different DRL algorithms.

      • RQ1/ stores the scripts for replicating the results of RQ1, where we compare the performances between AI-enabled CPS and traditional CPS;

      • RQ2/ stores the scripts for replicating the results of RQ2, where we evaluate the effectiveness of different falsification approaches on AI-enabled CPS;

      • RQ3/ stores the scripts for replicating the results of RQ3, where we construct CPS with hybrid controllers and evaluate their performances using the methods in RQ1 and RQ2.


  • The folder tools/ includes two widely-used falsification tools, namely, Breach and S-TaLiRo. Specifically, we select Global Nelder-Mead (GNM) and CMAES for Breach, and Simulated Annealing (SA) and stochastic optimization with adaptive restart (SOAR) for S-TaLiRo. Details can be found here.

How to Reproduce the Experimental Results in the Paper

The usage of DRL controller training and the main functions are introduced below.

DRL Controller Training (Optional)

We recommend the users use our pre-trained models for replicating the experimental results; however, the users also have the option to train new DRL controllers by themselves, following the steps below.

  • Navigate to benchmarks/[MODEL]/train/ ;

  • Run RL_[MODEL]_[DRL].p to run a training script, that uses the [DRL] DRL algorithm for model [MODEL] .

  • During the training, the users will see a window showing the training status.

  • An RL agent will be generated when the training is finished, and it will be saved in a .mat file, named as [MODEL]_[DRL]_Agent_[DATE].mat

For example, you can use the following command to train a DDPG agent for the ACC system:

  • RL_ACC_DDPG.p

A window will jump out to show the status of training including episode number, episode reward, episode step, the total number of steps and etc.

RL training status window

The user can terminate the training process by clicking the STOP button on the left top corner, otherwise, the training will get terminated when the maximum episodes number has been reached which is 5000 by default. For different systems, depending on the environment and reward function setup, the episode numbers needed to find the optimal policy are different. Users can monitor the results on average reward and average steps to determine whether the episodes are enough to output a trustworthy agent.

When the training is finished, an MAT file will be outputted which contains the trained agent. For the example above, the MAT file will be ACC_DDPG_Agent_8_23.mat. This agent can be directed loaded by the corresponding Simulink model to behave as a controller.

RQ1: Performance of AI-enabled CPS vs. traditional CPS

The Simulink models under evaluation are found benchmarks/[MODEL]/RQ1/model. To reproduce the experimental results in RQ1, users should follow the steps below:

  • Navigate to benchmarks/[MODEL]/RQ1/model/;

  • Run [MODEL]_evaluation.p, with two arguments: the file of DRL controller, and the file of Simulink model of [MODEL]

  • The evaluation results will be generated after a while, with the file name [MODEL]_eval_[DRL]_result.mat and [MODEL]_eval_[DRL]_result_detailed.mat. The former records the averaged value of the performance in 100 simulations, and the latter contains the detailed performance values for each simulation.

For example, to evaluate the performance of a DDPG agent in the ACC system, use the command:

  • ACC_evaluation(ACC_DDPG_Agent_9_22.mat, RL_ACC_eval)

The first argument is the trained RL controller obtained from the training section, and the second is the name of the Simulink model that we use for evaluation. All the pre-trained agents have been listed under the RL/agent/ folder, alternatively, we can use your own agent trained from the previous section.

The output files in this example are two MAT files:

  • ACC_eval_DDPG_result.mat

  • ACC_eval_DDPG_result_detailed.mat

The first MAT file saves the average results of a DDPG controller in the ACC system over 100 simulations among the 5 evaluation metrics: S1, S3, S4, and S5. Detailed explanations of these metrics are available here.

The second records the results from each simulation. The first MAT file takes some statistics processing based on the detailed simulations from the second MAT file to generate an overall performance of the DDPG controller in the ACC system.

ACC_eval_DDPG_result_detailed.mat

ACC_eval_DDPG_result.mat

As the performance values are obtained from the previous steps in MAT format, different metrics have been ranked in ascending or descending orders. To clearly illustrate the performance comparison via different controllers, radar charts have been applied to show the comparison results. The radar charts are generated by Python via Matplotlib library and the source file is available under the same folder. Some data processing has been made to make the result values follow the ascending order that higher values in radar charts represent better performance.

For example, to plot some radar charts in RQ1, run the command:

  • python CPS_radar_Chart_RQ1.py

This python script will generate a radar chart for each system in RQ1 and the pictures are saved in PNG format. A sample result is shown below. From the radar charts, users can get a direct and clear understanding of the performance of controllers in different systems among various metrics.

Radar chart illustration for RQ1

RQ2: Effectiveness of Falsification

The directory benchmarks/[MODEL]/RQ2/ includes two sub-directories, namely breach and staliro. To reproduce the experimental results in RQ2, users should follow the steps below:

  • Navigate to benchmarks/[MODEL]/RQ2/model/

  • Run [MODEL]/[FAL].p, where [FAL] indicates a falsification tool;

  • The evaluation results will be generated after a while, with the name [MODEL]_[DRL]_[FAL]_Result.mat . The result file consists of three parts: #sim, time and robustness in 30 trails.

RQ3: Performance of Hybrid Controllers

The steps of reproducing RQ3 are similar to RQ1, i.e.,

  • Navigate to benchmarks/[MODEL]/RQ1/model/;

  • Run [MODEL]_evaluation.p, with two arguments: the file of the hybrid DRL controller, and the file of Simulink model of [MODEL] . Note that the hybrid DRL controller file is named as [MODEL]_eval_hybrid_[HYBRID]_eval.slx , where [HYBRID] indicates a hybrid controller combination method introduced here.

  • The evaluation results will be generated after a while, with the file name [MODEL]_eval_hybrid_[DRL]_[HYBRID]_result.mat and [MODEL]_eval_hybrid_[DRL]_[HYBRID]_result_detailed.mat. These two files are similar with the output files in RQ1.

For example, to evaluate the performance of a DDPG agent in the ACC system with an average-based combination method, use the command:

  • ACC_evaluation(ACC_DDPG_Agent_9_22.mat, RL_ACC_hybrid_average_eval)

The process of RQ3 is similar to RQ1 except the Simulink model is changed to one with hybrid control systems. The first argument is the trained RL controller obtained from the training section, and the second is the name of the Simulink model with an average-based hybrid control system that we use for evaluation.

All the pre-trained agents have been listed under the RL/agent/ folder, alternatively, we can use your own agent trained from the previous section. All pre-designed Simulink models are available under the RL/model/ folder, we only use models with hybrid notations in RQ3.

The output files in this example are two MAT files:

  • ACC_eval_hybrid_average_DDPG_T_result.mat

  • ACC_eval_hybrid_average_DDPG_T_result_detailed.mat

The first MAT file saves the average results of a DDPG controller in the ACC system over 100 simulations among the 5 evaluation metrics: S1, S3, S4, and S5. The name of the MAT file denotes that this result is for the ACC system with an average-based hybrid method and the combined controllers are a DDPG controller and a traditional controller.

Similiar to RQ1, the second MAT file records the detailed evaluation results of each simulation.

ACC_eval_hybrid_average_DDPG_T_result_detailed.mat

ACC_eval_hybrid_average_DDPG_T_result.mat

Similar to RQ1, radar charts are also applied in RQ3, but focusing on the difference in hybrid controllers.

For example, to plot some radar charts in RQ3, run the command:

  • python CPS_radar_Chart_RQ3.py

A sample result is shown below. From the radar charts, users can get a direct and clear understanding of the performance of hybrid controllers in different systems with various metrics.

Radar chart illustration for RQ3