1. INTRODUCTION
This dataset was used in the paper, "Expanding the Attack Scenarios of SAE J1939: A Comprehensive Analysis of Established and Novel Vulnerabilities in Transport Protocol," presented at ESCAR USA in June 2024.
For more information about this dataset, please refer to our description paper below.
2. DATASET DESCRIPTION
We used a simulator that operates identically to a real vehicle to demonstrate the validity of our attack scenarios. We used a Kvaser USBcan Pro 2xHS v2 to collect the CAN messages.
Figure 1. J1939 attack simulation testbed setup. Laptop 1 monitors the CAN bus and Laptop 2 acts as the attacker within the SAE J1939 network, comprising Au SAE J1939 simulators (Generation II) and Au SAE J1939 MCS
The products used were the Au SAE J1939 Simulators (Gen II) and the Au SAE J1939 Message Center System (MCS). You can find detailed descriptions of each product at the links below.
Au SAE J1939 Simulators (Gen II): https://www.auelectronics.com/System-J1939Simulator.htm
Au SAE J1939 MCS: https://www.auelectronics.com/System-MCSJ1939-001.htm
The communication consists of 37 unique CAN IDs and 28 unique PGNs.
To analyze the specific SPN values, we used the J1939DA JAN23.xlsx file, which is a standard DBC included with the purchase of the SAE J1939 standard. This DBC file allows for a detailed analysis of SPN values.
Using Kvaser, we can extract the Timestamp, Arbitration ID, DLC, and Data, as shown in Figure 2.
Figure 2. Raw data extracted from Kvaser
To facilitate analysis, we added several columns. The descriptions for all columns, including the original ones, are listed below. The benign data file appears as shown in Figure 3. The data was collected for approximately 71.5 minutes.
Figure 3. Parsed data
Timestamp: The time (in seconds) the message was collected.
Arbitration_ID: The Arbitration ID value expressed in hexadecimal.
DLC: The length of the Data Field.
Data: The value contained in the Data Field.
Arbitration_ID(int): The Arbitration ID value expressed in decimal.
PGN: The Parameter Group Number value expressed in decimal.
PF: The PGN Format value expressed in decimal.
PS: The PGN Specific value expressed in decimal.
SA: The Source Address value expressed in decimal.
Data(int): A list containing the decimal data values.
Detailed explanations of each attack can be found in the paper. The intrusion datasets have the same columns as the benign dataset, with the addition of one extra column: label.
label: normal (0) / abnormal (1) The dataset consists of 15 files, covering 11 effective attack scenarios. If a single scenario includes multiple cases, the case number is listed after the scenario number.
The dataset consists of 15 files, covering 11 effective attack scenarios. If a single scenario includes multiple cases, the case number is listed after the scenario number.
3. CITATION
If you use this dataset, please cite the following paper:
Paper Link: https://arxiv.org/abs/2406.00810
4. DATASET DOWNLOAD
Download Link: Download
5. CONTACT
Hwejae Lee (hwejae94@korea.ac.kr) or Huy Kang Kim (cenda@korea.ac.kr)