Secure Water Treatment (SWaT)
Characteristics of dataset (SWaT.A1_Dec 2015)
- 11 days of continuous operation: 7 under normal operation and 4 days with attack scenarios
- Collected network traffic & all the values obtained from all the 51 sensors and actuators
- Data labelled according to normal and abnormal behaviours
- Attack Scenarios: Derived through the attack models developed by our research team. The attack model considers the intent space of a CPS as an attack model. 41 attacks were launched during the 4 days and are described in the PDF.
The Jupyter notebook available at GitHub repository by ngoclesydney presents a comprehensive implementation of deep learning-based anomaly detection techniques applied to the iTrust Secure Water Treatment (SWaT) dataset. This repository develops novel security metrics using advanced deep learning approaches to detect anomalous attacks in critical infrastructure systems. The implementation leverages 1D (time) Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) specifically designed to identify anomalous attacks in Industrial Control Systems (ICS). The SWaT dataset, represents a scaled-down but high-fidelity replica of a modern six-stage water treatment facility that processes approximately 19 litres of water per minute. This dataset is particularly valuable for cybersecurity research as it contains both normal operational data and records of various cyber-physical attacks. This repo implements deep learning models using Google’s TensorFlow framework version 2.x and the Keras library.
Researchers interested in anomaly detection techniques for critical infrastructure protection will find this implementation helpful as it demonstrates practical applications of machine learning for securing industrial control systems.
Updates on dataset
24 Sep 18 (SWaT.A2_Dec 2015)
Two sets of “SWaT_Dataset_Normal” – versions 0 and 1 – are provided. The datasets capture the normal state of the SWaT testbed running for seven days. In Version 0, we started recording the data when the plant was emptying the water storage tank for 30 minutes. In general, in an ICS environment, this is part of the maintenance outside normal operations. As a result of this drainage, the first 30 minutes of LIT101 data exhibits change even though there was no water in/outflow. Version 1 is derived from version 0 by removing the first 30 minutes of data.
SWaT.A3_Jun 2017
136 hours of network traffic and historian data from continuously running SWaT (no attacks) was collected over 6 days.
14 Aug 19 (SWaT.A4_Jul 2019)
A new set of SWaT dataset, collected during Jul 2019, is available for downloading. This set includes 3 hours of SWaT running under normal operating condition and 1 hour during which 6 attacks were carried out. Those who have previously received SWaT dataset download link can use the same link to access this new dataset.
23 Oct 19 (SWaT.A5_Jul 2019)
We received queries on the SWaT dataset, collected during Jul 2019, e.g., under LS 201 the fields were recorded as “{u’IsSystem’: False, u’Name’: u’Inactive’, u’Value’: 0}”. These fields have been updated to “Active” or “Inactive” and the dataset saved as version 2. The fields’ definitions are provided in the “readme.docx” document that was shared along with the dataset. Those who were given the link to download the dataset previously can download the new files using the same link.
19 Dec 19 (SWaT.A6_Dec 2019)
A new set of SWaT dataset, collected during Dec 2019, is available for downloading. The dataset consists of pcap and Historian Data (.csv) files. The dataset records a series of malware infection attacks on the SWaT Engineering Workstation. The malware attacks include Historian Data Exfiltration attack and Process Disruption attacks.
This set includes 3 hours of SWaT running under normal operating condition and 1 hour in which 6 attacks were carried out. Those who have previously received SWaT dataset download link can use the same link to access this new dataset.
9 Jul 2020 (SWaT.A7_June 2020)
SWaT was run on 4 occasions (no attack). Each run lasted either 2 or 4 hours. Network traffic data was captured for the 4 runs.
SWaT Dec 2023 (100-hour run)
Characteristics of the Datasets 18 Dec – 22 Dec 2023 SWaT.A9_18 Dec 2023 & WaDi.A3_18 Dec 2023 datasets, i.e., 100 Hour Run Dataset
- 105 hours of continuous operations of both SWaT and WaDi testbeds under normal conditions.
- The first 5 hours of continuous run of both testbeds was not in CrossOver mode, i.e., each testbed was run in a closed loop.
- The testbeds were run for the remaining 100 hours in CrossOver mode.
- No attacks were performed, i.e., Clean Dataset.
- The flow rate of the WaDi Consumer Tank 2 was deliberately set to 0.00 m^3/h to avoid triggering its faulty sensor.
- The following files are available:
- The PCAPs for the 100 Hour Run Dataset
- Total number of files and respective data size: 4,106 x 190MB files totalling 764 GB
- WaDi: 1,804 x 190MB files totalling 335 GB
- SWaT: 2,302 x 190MB files totalling 429 GB
- The Historian Datasets for both SWaT and WaDi
- The NetFlow logs (Incomplete Set – The Net_Flow VM was not able to capture the full run due to its unavailability)
- The PCAPs for the 100 Hour Run Dataset