Dataset Construction

CAShift Dataset

we collect system logs from three highly influential open-source applications (WordPress, Joomla and Jinja). In addition to the normal logs, CAShift contains attack logs based on existing CVE vulnerabilities found in these three applications. Considering shift scenarios, we collect logs from WordPress under its three different versions and three different cloud container runtime environments. We also include 20 types of CVE vulnerabilities from various components of cloud-native systems. We replay these vulnerabilities in their respective affected components and versions to collect the corresponding logs. After that, the capability of LAD methods in handling normality shifts is evaluated by the collected datasets. Furthermore, continuous learning methods are employed to select important data to adapt LAD models to new data distribution.

Dataset information

Scripts to capture system call logs in the cloud (normal and attacks)
Proof of Concept (PoC) and CVE information for 20 attack scenarios

Vulnerabilities included in CAShift Dataset

Dataset Constitution

We select (Kubernetes system using containerd and runc deployed with WordPress in version 6.2) as [Base logs] and set shift logs in cloud application (Jinja2) as [App-1], (Joomla) as [App-2]. (WordPress version 4.8) as [Version-1], (WordPress version 5.6) as [Version-2], cloud runtime (containerd with gVisor) as [Arch-1] and cloud runtime (cri-o with runc) as [Arch-2].

All Application Shift and Version Shift logs are collected on the Kubernetes system using containerd and runc container runtimes.

Common attack surfaces in cloud systems

Quantitative Analysis of CAShift

We employ the commonly used unsupervised method t-SNE to help illustrate the embedding distribution of log information. Specifically, we randomly sample 100 logs from each shift scenario and all attack logs first and then employ the pre-trained BERT-base model to produce the embeddings of each log sample.

T-SNE visualization of shift logs comparing to attack logs and normal logs

Page updated

Google Sites

Report abuse