Industrial Control System (ICS) Cyber Attack Datasets
Dataset 1: Power System Datasets
Uttam Adhikari, Shengyi Pan, and Tommy Morris in collaboration with Raymond Borges and Justin Beaver of Oak Ridge National Laboratories (ORNL) have created 3 datasets which include measurements related to electric transmission system normal, disturbance, control, cyber attack behaviors. Measurements in the dataset include synchrophasor measurements and data logs from Snort, a simulated control panel, and relays.
The power system datasets have been used for multiple works related to power system cyber-attack classification.
Pan, S., Morris, T., Adhikari, U., Developing a Hybrid Intrusion Detection System Using Data Mining for Power Systems, IEEE Transactions on Smart Grid. doi: 10.1109/TSG.2015.2409775 link
Pan, S., Morris, T., Adhikari, U., Classification of Disturbances and Cyber-attacks in Power Systems Using Heterogeneous Time-synchronized Data, IEEE Transactions on Industrial Informatics. doi: 10.1109/TII.2015.2420951 link
Pan, S., Morris, T., Adhikari, U., A Specification-based Intrusion Detection Framework for Cyber-physical Environment in Electric Power System, International Journal of Network Security (IJNS), Vol.17, No.2, PP.174-188, March 2015. pdf
Beaver, J., Borges, R., Buckner, M., Morris, T., Adhikari, U., Pan, S., Machine Learning for Power System Disturbance and Cyber-attack Discrimination, Proceedings of the 7th International Symposium on Resilient Control Systems, August 19-21,2014, Denver, CO, USA. link
Dataset 2: Gas Pipeline Datasets
These datasets were created in collaboration with Justin Beaver and Raymond Borges of Oak Ridge National Laboratories (ORNL). Raw data logs were provided to Justin by the MSU team and the ORNL team formatted these logs into datasets.
ORNL Formatted SCADA Gas Pipeline Datasets
Please cite the following paper if using these datasets.
Beaver, Justin M., Borges-Hink, Raymond C., Buckner, Mark A., "An Evaluation of Machine Learning Methods to Detect Malicious SCADA Communications," in the Proceedings of 2013 12th International Conference on Machine Learning and Applications (ICMLA), vol.2, pp.54-59, 2013. doi: 10.1109/ICMLA.2013.105 link
Dataset 3: Gas Pipeline and Water Storage Tank
Wei Gao and Tommy Morris have created a database of cyber attacks against 2 laboratory scale industrial control systems; a gas pipeline and water storage tank. From these multiple datasets have been created.
The cyber attacks used to create datasets on this page are described in the dissertation cited below.
Morris, T., Gao, W., "Industrial Control System Network Traffic Data sets to Facilitate Intrusion Detection System Research," in Critical Infrastructure Protection VIII, Sujeet Shenoi and Johnathan Butts, Eds. ISBN: 978-3-662-45354-4. Due November 14, 2014. link
The paper below describes the lab the datasets were taken in. The datasets were taken from attacks and normal network behavior from two laboratory scale industrial control systems; a water storage tank and a gas pipeline.
Morris, T. Srivastava, A., Reaves, B., Gao, W., Pavurapu, K., Reddi, R. A Control System Testbed to Validate Critical Infrastructure Protection Concepts. International Journal of Critical Infrastructure Protection (2011). Elseiver. doi:10.1016/j.ijcip.2011.06.005 link
Note: These datasets have been found to contain some unintended patterns which cause machine learning algorithms to easily identify attacks versus non-attacks in unrealistic ways. For example some attacks were performs with gas pressure set to one value (x) and normal operation with another value (y). Machine learning algorithms see these patterns and take pressure = X as indication of attack. This is not correct. Please see this document describing flaws in the data sets. We are creating new data sets with these problems fixed. Please check back soon (by end of 2014).
The following datasets were create for use in WEKA. These datasets are described in detail in chapter 4 of the dissertation cited below. The attacks used to create the datasets are described in chapter 3 of the same dissertation.
Morris, T., Gao, W., "Industrial Control System Network Traffic Data sets to Facilitate Intrusion Detection System Research," in Critical Infrastructure Protection VIII, Sujeet Shenoi and Johnathan Butts, Eds. ISBN: 978-3-662-45354-4. Due November 14, 2014. link
10% Random Sample Water Storage Tank
10% Random Sample Gas Pipeline
These datasets were used by Patric Nader, Paul Honeine, and Pierre Beauseroy to examine lp-norms in One-Class Classification for Intrusion Detection in SCADA Systems. link
Dataset 4: New Gas Pipeline
Ian Turnipseed developed a new set of datasets with more randomness. These datasets are only from the gas pipeline control system.
Power point describing the datasets
Master's Thesis describing the datasets
Please cite the following paper when using these datasets.
Morris, T., Thornton, Z., Turnipseed, I., Industrial Control System Simulation and Data Logging for Intrusion Detection System Research. 7th Annual Southeastern Cyber Security Summit. Huntsvile, AL. June 3 - 4, 2015. pdf
Dataset 5: Energy Management System Data
An investor owned utility has shared a large anonymized log file from an Energy Management System (EMS). The log file includes 30 days of EMS logs.