Bugs in Pods: Understanding Bugs in Container Runtime Systems
Data
In our website, we provide data including 1) Archetecture of container runtime systems 2) Taxonomy of symptoms and root causes in container runtime systems bug 3) Security vulnerabilities in container runtime systems (CVE, Github Security Advisory) 4) Testing approaches of container runtime systems and the effectiveness in detecting bugs 5) Manual analysis data of bugs in container runtime projects (runc, gvisor, containerd, cri-o)
Source Code: https://github.com/fish98/CRS_Bugs
Abstract
Container Runtime Systems (CRSs), which form the foundational infrastructure of container clouds, are critically important due to their impact on the quality of container cloud implementations. However, a comprehensive understanding of the quality issues present in CRS implementations remains lacking. To bridge this gap, we conducted the first comprehensive empirical study of CRS bugs. Specifically, we gathered 429 bugs from 8,271 commits across dominant CRS projects, including runc, gvisor, containerd, and cri-o. Through manual analysis, we developed taxonomies of bug symptoms and root causes, comprising 16 and 13 categories, respectively. Furthermore, we evaluated the capability of popular testing approaches, including unit testing, integration testing, and fuzz testing, in detecting these bugs. The results show that 78.79% of the bugs cannot be detected due to the lack of test drivers, oracles, and effective test cases. Based on the findings of our study, we present implications and future research directions for various stakeholders in the domain of CRSs. We hope that our work can lay the groundwork for future research on CRS bug detection.
Methodology of our work
To comprehend the CRS bugs, we first gathered the commits of runc, gvisor, cri-o and containerd, and then filtered the results with keywords to identify bug-repairing commits, which were kept as potential candidates. The collected commits are used for manual analysis to study the first two research questions. We manually read, triaged, and labeled the bug-repairing commits, and then labeled the taxonomy of both symptoms and root causes for each bug. To tackle RQ3, we collect existing tests and subsequently execute them on corresponding software versions. We manually analyze the testing results and identify the specific reasons.
You can find our manually analyzed data from: Google Sheet: Manual Analysis Data