Dataset and Source Code
In this website, we provide the full replication package for our paper including:Â
Our collected dataset on OneDrive
Dataset: SMU OneDrive
Our code for dataset collection and benchmark evaluation on the GitHub repository
Source Code: https://github.com/fish98/CAShift
Abstract
With the rapid advancement of cloud-native computing, securing cloud environments has become an important task. Log-based Anomaly Detection (LAD) is the most representative technique used in different systems for attack detection and safety guarantee, where multiple LAD methods and relevant datasets have been proposed. However, even though some of these datasets are specifically prepared for cloud systems, they only cover limited cloud behaviors and lack information from a whole-system perspective. Besides, another critical issue to consider is normality shift, which implies the test distribution could differ from the training distribution and highly affects the performance of LAD. Unfortunately, existing works only focus on simple shift types such as chronological changes, while other important and cloud-specific shift types are ignored, e.g., the distribution shift introduced by different deployed cloud architectures. Therefore, creating a new dataset that covers diverse behaviors of cloud systems and normality shift types is necessary.
To fill this gap, we construct the first normality shift-aware dataset CAShift for evaluating the performance of LAD in cloud, which considers different software roles in cloud systems, supports three real-world normality shift types (application shift, version shift, and cloud architecture shift), and features 20 diverse attack scenarios in various cloud system components. Based on CAShift, we conduct a comprehensive empirical study to investigate the effectiveness of existing LAD methods in normality shift scenarios. Additionally, to explore the feasibility of shift adaptation, we further investigate three continuous learning approaches, which are the most common methods in mitigating the impact of distribution shift. Results demonstrated that 1) all LAD methods suffer from normality shift where the performance drops up to 34%, and 2) existing continuous learning methods are promising to address shift drawbacks, but the ratio of used data for model retraining and the selection of algorithms highly affect the shift adaption, with the increase of F1-Score by up to 27%. Based on our findings, we offer valuable implications for future research in designing more robust LAD models and methods for LAD shift adaptation.
Overview of Benchmark
Overview of our benchmarking framework