Croissant is a framework for injecting flakiness into the test suites to assess the effectiveness of test-flakiness detection tools in finding these tests. Croissant implements 18 flakiness-inducing mutation operators to allow controlling the non-determinism involved in flakiness. Our extensive empirical evaluation of Croissant on the test suites of 15 real-world projects confirms the ability of designed mutation operators to generate high-quality mutants, and their effectiveness in challenging test-flakiness detection tools in revealing flaky tests.
GitHub: github.com/dserfe/Croissant
Bytecode of mutants:
NOD/ID mutants: zenodo.org/record/7690506#.ZAA_mS-B35g
OD mutants: zenodo.org/record/7699492#.ZAUYjy-B2CA
Details of mutation operators:
Please see mutation operators for more details of implementation.
Details of evaluating Surefire/NonDex with NOD/ID mutants
All Surefire results: drive.google.com/drive/folders/1u62UO2tTxI6lc7LZs7GdkL7a3eVQfx-P?usp=sharing
Example results: below are Surefire results running on NOD and ID mutants from commons-cli, each column is:
Project: project name
Sha: commit ID of the project
Test Class: test class that is running
Timestamp: a timestamp to map with logs
Threshold: thresholds to control flakiness, which are from 0.1 to 1
Build Result: Surefire result running on the current test class
Total Time: total time of Surefire running on the current test class
Runs: the total number of tests run from the current test class
Failures: the number of tests end up with failure from the current test class
Errors: the number of tests end up with errors from the current test class
Flakes: the number of tests detected as flaky tests from the current test class
Flaky Tests: flaky tests detected
All NonDex results: drive.google.com/drive/folders/1-WQNJyen4nmE6XMhUNAhvnTD2bMukWL_?usp=sharing
Example results: below are NonDex results running on NOD and ID mutants from commons-cli, each column is:
Project: project name
Sha: commit ID of the project
Test Class: test class that is running
Timestamp: a timestamp to map with logs
Threshold: thresholds to control flakiness, which are from 0.1 to 1
Build Result: NonDex result running on the current test class
Total Time: total time of NonDex running on the current test class
Num of Test detected: the number of tests detected as flaky tests from the current test class
Tests detected: flaky tests detected
Details of evaluating iDFlakies with OD mutants
All iDFlakies results: drive.google.com/drive/folders/1m_gAhl4M6PdeQe5AAhTcH_CviRDfYwCb?usp=sharing
Example results: below are iDFlakies results running on DSD mutants from commons-cli, each column is:
Project: project name
Sha: commit ID of the project
Timestamp: a timestamp to map with logs
Build Result: iDFlakies result running on the current project
Total Time: total time of iDFlakies running on the current project
Num of Test detected: the number of tests detected as flaky tests from the current test class
Tests detected: flaky tests detected
cleanerCounts: number of cleaners, which are from 0 to 50