APT Attack Case Study
Setup
c1 Initial compromise: the attacker first sends a crafted e-mail to the victim (Windows client), which is opened with Outlook.
c2 Malware infection: the e-mail contains an attachment (for this attack, an Excel document) that drops and executes the malware. The malware creates a reverse connection back to the attacker host, inviting the attacker into the intranet.
c3 Privilege escalation: the attacker further penetrates into the domain controller host using PsExec protocol, and deploys a database cracking tool (gsecdump.exe). By executing the cracking tool, the attacker obtains the database server IP address and database administrator credential to escalate his privilege.
c4 Penetration into database server: using the credentials, the attacker penetrates into the database server and delivers a VBScript to drop another malware, which creates another reverse connection to the attacker host.
c5 Data exfiltration: with the access to the database server, the attacker uses a command line tool (osql.exe) to send IPC signals to the MS SQL Server process and dumps the database content. Finally, the attacker sends the database dump back to his host.
We performed a case study by asking white-hat hackers to launch an APT attack in the deployed environment and collected system monitoring data (119 GB). We then used the AIQL system to investigate the complete attack sequence. Our investigation assumes no prior knowledge of the detailed attack steps but merely the alerts reported by two simple anomaly detectors that we deployed. We start with these alerts and compose AIQL queries. This document records the detailed investigation steps that we performed, including the AIQL queries, their execution results, and the deduced information about the attack sequence.
End-to-end performance comparison
The above figure shows the log10 transformed execution time of queries in the AIQL system, PostgreSQL and Neo4j. We observe that: (1) Neo4j runs slower than PostgreSQL, due to lack of support for efficient joins in graph databases; (2) both PostgreSQL and Neo4j become very slow when the query becomes complex (some queries cannot finish within 1 hour); (3) AIQL queries finish execution within 15 seconds; (4) the total investigation time is ~5.9 hours for PostgreSQL and ~7.5 hours for Neo4j. In contrast, the total investigation time for the AIQL system is within 3 minutes (124x speed-up over PostgreSQL and 157x speed-up over Neo4j).
The largest AIQL query is c4-8 with 7 event patterns, 25 query constraints, 109 words and 463 characters (excluding spaces). In contrast, the corresponding SQL query contains 77 constraints (3.1x larger), 432 words (4.0x larger), and 2792 characters (6.0x larger). The corresponding Neo4j Cypher query contains 63 constraints (2.5x larger), 361 words (3.3x larger), and 2570 characters (5.6x larger). As the attack behaviors become more complex, SQL and Cypher queries become very verbose with many joins and constraints, posing challenges for composing the queries correctly in a timely manner to support attack investigation.