Ready to take anomaly detection for a test drive? In this lab you will get to:
Create anomaly detection jobs for the Kibana sample data
Use the results to identify possible anomalies in the data
The Kibana sample data sets include some pre-configured anomaly detection jobs for you to play with. Let's explore some of the ways to add the jobs:
Navigate to Machine Learning under the Analytics section
Under Data Visualizer, click on Data View
OR
Search for Data Visualizer in the top searchbar in Kibana
Click Select data view
3. Select the Kibana Sample Data Logs.
To get the best results from machine learning analytics, you must understand your data. You must know its data types and the range and distribution of values. The Data Visualizer enables you to explore the fields in your data.
4. In the Data Visualizer, click Use full data.
5. Explore the fields in the Data Visualizer. You can filter the list by field names or field types. The Data Visualizer indicates how many of the documents in the sample for the selected time period contain each field.
6. When you are ready, under Explore your data, click on Kibana sample data web logs to create a job using recommended configurations. If you are curious, you can see the configuration files in GitHub .
7. Accept the default values and click Create Jobs.
The wizard then creates three jobs and three datafeeds.
Here’s a quick overview of the goal of each job:
low_request_rate uses the low_count function to find unusually low request rates
response_code_rates uses the count function and partitions the analysis by response.keyword values to find unusual event rates by HTTP response code
url_scanning uses the high_distinct_count function and performs population analysis on the clientip field to find client IPs accessing an unusually high distinct count of URLs
The next step is to view the results and see what types of insights these jobs have generated!
After the datafeeds are started and the anomaly detection jobs have processed some data, you can view the results in Kibana.
8. Click on View results.
There are two tools for examining the results from anomaly detection jobs in Kibana: the Anomaly Explorer and the Single Metric Viewer. You can switch between these tools by clicking the icons in the top left corner. You can also edit the job selection to examine a different subset of anomaly detection jobs.
One of the sample jobs (low_request_rate), is a single metric anomaly detection job. It has a single detector that uses the low_count function and limited job properties. You might use a job like this if you want to determine when the request rate on your web site drops significantly.
Let’s start by looking at this simple job in the Single Metric Viewer:
Select the Anomaly Detection tab in Machine Learning to see the list of your anomaly detection jobs.
Click the chart icon in the Actions column for your low_request_rate job to view its results in the Single Metric Viewer.
Use the relative mode of the date picker to select a start date one week in the past and an end date one month in the future to cover the majority of the analyzed data points.
Slide the time selector to a section of the time series that contains a red anomaly data point. If you hover over the point, you can see more information.
For each anomaly, you can see key details such as the time, the actual and expected ("typical") values, and their probability in the Anomalies section of the viewer.
The Anomaly explanation section gives you further insights about each anomaly, such as its type and impact, to make it easier to interpret the job results.
Let’s start by looking at the response_code_rates job in the Anomaly Explorer:
Select the Anomaly Detection tab in Machine Learning to see the list of your anomaly detection jobs.
Open the response_code_rates job in the Anomaly Explorer to view its results by clicking the corresponding icon in the row of the job.
For this particular job, you can choose to see separate swim lanes for each client IP or response code.
Since the job uses response.keyword as its partition field, the analysis is segmented such that you have completely different baselines for each distinct value of that field. By looking at temporal patterns on a per entity basis, you might spot things that might have otherwise been hidden in the lumped view.
Click on a section in the swim lanes to obtain more information about the anomalies in that time period. For example, click on the red section in the swim lane for the response.keyword value of 404.
After you have identified anomalies, often the next step is to try to determine the context of those situations. For example, are there other factors that are contributing to the problem? Are the anomalies confined to particular applications or servers? You can begin to troubleshoot these situations by layering additional jobs or creating multi-metric jobs.
Let's try to make some jobs utilizing the Machine Learning wizard!