The AIQL System Architecture

The AIQL system consists of three major components: data collection, AIQL language parser, and AIQL query execution engine.

Data collection and storage: To collect enterprise-wide activity data, we deploy data collection agents for Windows and Linux based on ETW event tracing and Linux Audit Framework. Our agents collect system audit events that are crucial in security analysis, including three major types of events: (1) process creation and destroy, (2) file access, and (3) network access. Especially, our system assigns unique identifiers to distinguish system entities and synchronizes time drifts of the data collected from different hosts for constructing happen-before relationships across hosts.

The collected data is then sent to a central server, where the data will be modeled and stored in databases for security analysis. We store the collected data in relational databases powered by PostgreSQL and MPP (massive parallel processing) databases powered by Greenplum. When storing the data, we partition the data based on its temporal and spatial properties: separating groups of agents into table partitions and dumping one database per day for the data collected on that day. We apply data deduplication by storing entities in object tables and events in relationship tables.

AIQL language parser: The AIQL language performs syntactic and semantic analysis of input queries and generates query contexts. A query context is an object abstraction of the input query that contains all the required information for query execution. Multievent syntax, dependency syntax, and anomaly syntax are supported.

AIQL query execution engine: The query execution executes the generated query contexts to search the desired attack behaviors. Based on the data storage and the query semantics, domain-specific optimizations, such as relationship-based scheduling and temporal & spatial parallelization, are adopted to speed-up query execution.