Expanding the Landscape of Docker-specific Refactorings: A Large-Scale Empirical Study
Emna Ksontini, Thiago Ferreira, Rania Khalsi and Wael Kessentini
Abstract
Docker-based software containerization has recently emerged as the de facto standard for delivering reusable software artifacts. With a plethora of publicly available Docker images, developers can easily build and deploy their applications, resulting in an industry-wide shift toward containerized solutions. Container-based projects, on the other hand, include several components, such as the Docker and Docker-compose files, as well as several dependencies in the source code, combining different containers and simplifying interactions with them. Like any other complex system, Container-based projects are prone to multiple quality and technical debt issues relating to several artifacts, namely, Docker and Docker-compose files. In a previous work, we conducted the first foundational study on refactorings, i.e., structural changes, while preserving the behavior applied in open-source Docker projects and the technical debt issues they alleviate. The findings suggest that developers refactor these Docker projects for a variety of reasons specific to the configuration, combination, and execution of containers. We defined different best practices and introduced 24 new Docker-specific refactorings and 7 technical debt categories. In this paper, we extend our earlier research. By expanding our dataset nearly sixfold from 68 to 443 projects and refining our selection methodology, we uncover novel refactoring techniques and identify 17 additional Docker-specific refactorings, culminating in a catalog of 41 distinct refactorings. We further introduce 2 new technical debt bringing the total to 9.\\
These extensions not only expand the known landscape of Docker-specific quality issues but also provide deeper insights into how practitioners manage and alleviate technical debt in container environments.
Approach Overview
BigQuery Results ( Before filtering ): BigQuery_output.csv
Demographics code:
Verify artifacts build and deployment Code
Studied Projects
Commit List
Manual Refactoring Identification & Classfication
Fleiss' Kappa Agreement for refactoring identification .ipynb
Fleiss' Kappa Agreement for refactoring Classification .ipynb
Stats. ipynb
Refactoring Types & Description
