Background:
To combat the evolving threats, organizations typically transform unstructured natural-language-based cyber threat intelligence (CTI) reports into structured formats of intelligence, enabling streamlined security tasks such as threat detection and response.
Detailed attack-level intelligence usually offers a promising solution by providing fine-grained details including techniques and implementation procedures. These insights can automatic the generation of detection rules, like Sigma, widely used in industry-based Security Information and Event Management (SIEM) platforms due to the fast speed.
Overview:
Input: CTI report
Step-1 (Chunking and Identification): Segment the raw reports into parts likely to contain malicious behaviors by locating Indicators of Compromise (IoCs).
Step-2 (Intention Interpreter): Leverage the LLM's in-context learning to map report chunks to techniques, guided by task-specific prompts based on MITRE ATT&CK descriptions.
Step-3 (Knowledge Enhancer): Build a vector database embedding MITRE ATT&CK descriptions for 14 tactics and 253 techniques. Use this database to augment and retrieve relevant tactics and techniques for each report chunk.
Step-4 (Validator): Use an additional LLM, isolated from prior context, to evaluate the entire report and filter irrelevant techniques using grounded reasoning, reducing hallucinations.
Practical Value:
To address the industry challenge of automating rule-based detection, we deployed an industry-grade tool capable of analyzing detailed attack-level intelligence and generating precise detection rules. Security analysts can also leverage this tool to assist in analysis, enhancing their efficiency. A real-world scenario (honeypot) was conducted to validate its effectiveness, demonstrating significant progress in improving detection capabilities.