Research Question: How effective are LAD methods under normality shift?
After training LAD models on the same normality distribution, we evaluate the detection capability of trained models using the six shift scenaiors under three shift types of our collected CAShift.
LAD performance under normality shift scenarios. The best and worst results are highlighted by green and red background
Answer to Research Question:
All LAD methods are impacted by normality shifts, which lead to decreased detection performance by up to 34% in F1-Score. Specifically, prediction-based LAD methods are more sensitive to distribution shifts than reconstruction-based methods. We observe that the semantically enhanced traditional method, SemPCA, also performs well in the attack detection tasks, with an average F1-Score of 0.93. Its capability to learn the statistical significance of log distribution allows it to exhibit promising robust detection ability in scenarios with cloud architecture shifts. Its average performance (F1-Score of 0.78) under various shift scenarios outperforms most of the evaluated baselines. On the other hand, even though the semi-supervised PLELog model and the advanced LogAD model demonstrate superior performance in in-distribution data, they tend to experience a more significant performance degradation under shift scenarios compared to traditional deep learning models, with an average decline of 20.9% and 21.3%, respectively.
F1 scores achieved by each method under normality shifts