Attack Reports Ground Truth (Reports 1-16)
This dataset comprises 16 CTI reports from the AttacKG project (available at https://github.com/li-zhenyuan/Knowledge-enhanced-Attack-Graph/tree/main/Dataset). Each report includes detailed content along with annotated techniques (TTPs) identified within the report.
IntelEX Extracted Techniques from Attack Reports (Reports 1-8)
This dataset contains the techniques extracted by the IntelEX system from the first 8 CTI reports. Each entry reflects the identified techniques(TTPs) derived from the report content
Simulated Atomic Test Logs with Ground Truth
This dataset includes system logs generated by Atomic Tests, along with the corresponding ground truth annotations for malicious logs. Covering a total of 14 distinct techniques, each entry provides a comprehensive view of both benign and malicious activities, supporting the analysis and validation of detection mechanisms across multiple technique categories.
CTI Reports for Atomic Tests
This dataset contains a Cyber Threat Intelligence (CTI) report corresponding to each Atomic Test. Each report provides detailed intelligence on the specific techniques and tactics exercised in the test, offering insights into the threat patterns simulated by the Atomic Tests.
Field Study
Complete Honeypot Web Logs
This dataset comprises all web logs collected from a honeypot over three months, including both benign and malicious entries. It serves as a comprehensive source for analysis, model training, and understanding attack patterns in a real-world environment.
Labeled Malicious Honeypot Web Logs
This dataset contains labeled malicious web logs collected over three months from a honeypot environment. Each entry is annotated and identified as malicious, providing a reliable foundation for security research and development of threat detection algorithms.
IntelEX-Generated Rules
These Splunk detection rules are automatically generated by our IntelEX system after processing Cyber Threat Intelligence (CTI) reports to extract Tactics, Techniques, and Procedures (TTPs). The generated rules are tailored to identify malicious logs within a honeypot environment, enabling effective detection and monitoring of potentially harmful activities based on observed threat intelligence patterns.
Sigma-Based Splunk Detection Rules
These Splunk detection rules are derived from the open-source Sigma rule dataset (available at https://github.com/SigmaHQ/sigma/tree/master/rules-emerging-threats). Using our project’s conversion process, we adapted the Sigma rules into Splunk format to detect malicious logs. These rules serve as a benchmark for comparison against the Splunk rules generated by the IntelEX system, providing a valuable reference in the evaluation of detection accuracy and effectiveness.
Open-Source Splunk Detection Rules
This dataset consists of Splunk detection rules sourced from the open repository (https://github.com/splunk/security_content/tree/develop/detections), specifically selecting rules from the "web" and "application" folders for experimental purposes. These pre-existing Splunk rules are utilized to capture malicious logs in the honeypot environment and serve as a comparison benchmark against the detection rules generated by the IntelEX system.