Log Aggregation, Processing and Analysis for Security:
Logs and events are a foundation of modern security monitoring, investigation and forensics. In this chapter you’ll learn in-depth how logs are aggregated, processed and stored, and how they are used in the security operations center (SOC).
What is Log Aggregation?
Log aggregation is the process of collecting logs from multiple computing systems, parsing them and extracting structured data, and putting them together in a format that is easily searchable and explorable by modern data tools.
There are four common ways to aggregate logs – many log aggregation systems combine multiple methods
1:Syslog:
A standard logging protocol. Network administrators can set up a Syslog server that receives logs from multiple systems, storing them in an efficient, condensed format which is easily queryable.
Log aggregators can directly read and process Syslog data
2:Event Streaming:
Protocols like SNMP, Netflow and IPFIX allow network devices to provide standard information about their operations, which can be intercepted by the log aggregator, parsed and added to central log storage
3:Log Collectors:
Software agents that run on network devices, capture log information, parse it and send it to a centralized aggregator component for storage and analysis
4: Direct Access:
Log aggregators can directly access network devices or computing systems, using an API or network protocol to directly receive logs. This approach requires custom integration for each data source.
What is Log Processing?
Log processing is the art of taking raw system logs from multiple sources, identifying their structure or schema, and turning them into a consistent, standardized data source.
The Log Processing Flow:
01: Log Parsing
Each log has a repeating data format which includes data fields and values. However, the format varies between systems, even between different logs on the same system.
A log parser is a software component that can take a specific log format and convert it to structured data. Log aggregation software includes dozens or hundreds or parsers written to process logs for common systems.
02: Log Normalization and Categorization
Normalization merges events containing different data into a reduced format which contains common event attributes. Most logs capture the same basic information – time, network address, operation performed, etc.
Categorization involves adding meaning to events – identifying log data related to system events, authentication, local/remote operations, etc.
03: Log Enrichment
Log enrichment involves adding important information that can make the data more useful.
For example, if the original log contained IP addresses, but not actual physical locations of the users accessing a system, a log aggregator can use a geolocation data service to find out locations and add them to the data.
04: Log Indexing
Modern networks generate huge volumes of log data. To effectively search and explore log data, there is need to create an index of common attributes across all log data.
Searches or data queries that use the index keys can be an order of magnitude faster, compared to a full scan of all log data.
05: Log Storage
Because of the massive volumes of logs, and their exponential growth, log storage is rapidly evolving. Historically, log aggregators would store logs in a centralized repository. Today, logs are increasingly stored on data lake technology, such as Amazon S3 or Hadoop.
Data lakes can support unlimited storage volumes with low incremental storage cost, and can provide access to the data via distributed processing engines like MapReduce, or modern high performance analytics tools.
LOG TYPES:
Endpoint Logs
Router Logs
Application Event Logs
IoT Logs
Proxy Logs