Industrial Control System (ICS) Cyber Attack Datasets

Dataset 1: Power System Datasets

Uttam Adhikari, Shengyi Pan, and Tommy Morris in collaboration with Raymond Borges and Justin Beaver of Oak Ridge National Laboratories (ORNL) have created 3 datasets which include measurements related to electric transmission system normal, disturbance, control, cyber attack behaviors. Measurements in the dataset include synchrophasor measurements and data logs from Snort, a simulated control panel, and relays.

README Description

2 Classes

3 Classes

Multi-class

The power system datasets have been used for multiple works related to power system cyber-attack classification.

  1. Pan, S., Morris, T., Adhikari, U., Developing a Hybrid Intrusion Detection System Using Data Mining for Power Systems, IEEE Transactions on Smart Grid. doi: 10.1109/TSG.2015.2409775 link

  2. Pan, S., Morris, T., Adhikari, U., Classification of Disturbances and Cyber-attacks in Power Systems Using Heterogeneous Time-synchronized Data, IEEE Transactions on Industrial Informatics. doi: 10.1109/TII.2015.2420951 link

  3. Pan, S., Morris, T., Adhikari, U., A Specification-based Intrusion Detection Framework for Cyber-physical Environment in Electric Power System, International Journal of Network Security (IJNS), Vol.17, No.2, PP.174-188, March 2015. pdf

  4. Beaver, J., Borges, R., Buckner, M., Morris, T., Adhikari, U., Pan, S., Machine Learning for Power System Disturbance and Cyber-attack Discrimination, Proceedings of the 7th International Symposium on Resilient Control Systems, August 19-21,2014, Denver, CO, USA. link

Dataset 2: Gas Pipeline Datasets

These datasets were created in collaboration with Justin Beaver and Raymond Borges of Oak Ridge National Laboratories (ORNL). Raw data logs were provided to Justin by the MSU team and the ORNL team formatted these logs into datasets.

ORNL Formatted SCADA Gas Pipeline Datasets

Please cite the following paper if using these datasets.

Beaver, Justin M., Borges-Hink, Raymond C., Buckner, Mark A., "An Evaluation of Machine Learning Methods to Detect Malicious SCADA Communications," in the Proceedings of 2013 12th International Conference on Machine Learning and Applications (ICMLA), vol.2, pp.54-59, 2013. doi: 10.1109/ICMLA.2013.105 link

Dataset 3: Gas Pipeline and Water Storage Tank

Wei Gao and Tommy Morris have created a database of cyber attacks against 2 laboratory scale industrial control systems; a gas pipeline and water storage tank. From these multiple datasets have been created.

The cyber attacks used to create datasets on this page are described in the dissertation cited below.

Morris, T., Gao, W., "Industrial Control System Network Traffic Data sets to Facilitate Intrusion Detection System Research," in Critical Infrastructure Protection VIII, Sujeet Shenoi and Johnathan Butts, Eds. ISBN: 978-3-662-45354-4. Due November 14, 2014. link

The paper below describes the lab the datasets were taken in. The datasets were taken from attacks and normal network behavior from two laboratory scale industrial control systems; a water storage tank and a gas pipeline.

Morris, T. Srivastava, A., Reaves, B., Gao, W., Pavurapu, K., Reddi, R. A Control System Testbed to Validate Critical Infrastructure Protection Concepts. International Journal of Critical Infrastructure Protection (2011). Elseiver. doi:10.1016/j.ijcip.2011.06.005 link

Note: These datasets have been found to contain some unintended patterns which cause machine learning algorithms to easily identify attacks versus non-attacks in unrealistic ways. For example some attacks were performs with gas pressure set to one value (x) and normal operation with another value (y). Machine learning algorithms see these patterns and take pressure = X as indication of attack. This is not correct. Please see this document describing flaws in the data sets. We are creating new data sets with these problems fixed. Please check back soon (by end of 2014).

Report on data set flaws.

The following datasets were create for use in WEKA. These datasets are described in detail in chapter 4 of the dissertation cited below. The attacks used to create the datasets are described in chapter 3 of the same dissertation.

Morris, T., Gao, W., "Industrial Control System Network Traffic Data sets to Facilitate Intrusion Detection System Research," in Critical Infrastructure Protection VIII, Sujeet Shenoi and Johnathan Butts, Eds. ISBN: 978-3-662-45354-4. Due November 14, 2014. link

Raw Data Water Storage Tank

Raw Data Gas Pipeline

10% Random Sample Water Storage Tank

10% Random Sample Gas Pipeline

These datasets were used by Patric Nader, Paul Honeine, and Pierre Beauseroy to examine lp-norms in One-Class Classification for Intrusion Detection in SCADA Systems. link

Dataset 4: New Gas Pipeline

Ian Turnipseed developed a new set of datasets with more randomness. These datasets are only from the gas pipeline control system.

Raw Dataset

Dataset in ARFF Format.

Power point describing the datasets

Master's Thesis describing the datasets

Please cite the following paper when using these datasets.

Morris, T., Thornton, Z., Turnipseed, I., Industrial Control System Simulation and Data Logging for Intrusion Detection System Research. 7th Annual Southeastern Cyber Security Summit. Huntsvile, AL. June 3 - 4, 2015. pdf

Dataset 5: Energy Management System Data

An investor owned utility has shared a large anonymized log file from an Energy Management System (EMS). The log file includes 30 days of EMS logs.

Raw Dataset

Description of the Dataset