- The Python Jupyter Notebook scripts that were used for automating the data collection process* can be found here. More specifically, the link includes the following scripts:
- Fetch GitHub projects.ipynb - fetch multiple consecutive versions of a selected application from GitHub repository
- Perform Gradle/Maven Analysis.ipynb - compile fetched versions of a selected application using Gradle or Maven and then analyze each version using SonarQube tool
- Fetch SonarQube measures.ipynb - fetch the analysis results (TD-related measurements) of a selected application from SonarQube API and then store the analysis results in .csv format
- The Python Jupyter Notebook script that was used during data exploration and feature selection process can be found here. More specifically, the link includes the following script:
- Data Exploration and Feature Selection.ipynb - generate descriptive statistics and perform correlation, univariate and multivariate analysis on the extended dataset
- The boxplots of TD indicators generated during the data exploration process can be found here.
- The "backward elimination" intermediate results (comprising a descriptive table and the Python logs) can be found here.
- The dataset that was used during the TD forecasting process can be found here. More specifically, the link includes the following .csv files:
- _benchmark_repository_measures.csv - the extended dataset that was used for the feature selection process (i.e. descriptive statistics, correlation, univariate and multivariate analysis). Contains TD-related metrics extracted from SonarQube and CKJM Extended
- 15 .csv files that contain TD metrics and measurements of each open-source application, used as input for the TD forecasting models
- 2 anonymised .csv files that contain TD metrics and measurements of the 2 anonymised industrial applications (Project A and Project B), used within the context of the case study
- The Python Jupyter Notebook scripts that were used during the TD forecasting process can be found here. More specifically, the link includes the following scripts:
- 15 .ipynb scripts - the scripts that were used to train, test, benchmark and execute the TD forecasting models for each application
- Indicative visualizations of the forecasting results generated during the model execution process can be found here. More specifically, the link includes the following figures:
- 15 figures illustrating TD Principal forecasting results for 20 versions ahead using Random Forest and the Direct approach, for each application under investigation.
- 2 anonymised figures illustrating TD Principal forecasting results for 10 versions ahead of the 2 anonymised industrial applications (Project A and Project B), used within the context of the case study
- The figures of the various identified TD Principal trend cases of the 2 anonymised industrial applications (Project A and Project B), used within the context of the case study can be found here. More specifically, the link includes the following figures:
- 4 figures illustrating abrupt TD Principal trends of Project A
- 3 figures illustrating abrupt TD Principal trends of Project B
* The analysis of selected applications using CKJM Extended could not be automated due to tool limitations and therefore was performed manually