Transforming Test Suites into Croissants

Croissant is a framework for injecting flakiness into the test suites to assess the effectiveness of test-flakiness detection tools in finding these tests. Croissant implements 18 flakiness-inducing mutation operators to allow controlling the non-determinism involved in flakiness. Our extensive empirical evaluation of Croissant on the test suites of 15 real-world projects confirms the ability of designed mutation operators to generate high-quality mutants, and their effectiveness in challenging test-flakiness detection tools in revealing flaky tests.

GitHub: github.com/dserfe/Croissant

Bytecode of mutants:

NOD/ID mutants: zenodo.org/record/7690506#.ZAA_mS-B35g
OD mutants: zenodo.org/record/7699492#.ZAUYjy-B2CA

Details of mutation operators:
Please see mutation operators for more details of implementation.

Dataset of projects:
docs.google.com/spreadsheets/d/1lmg3l0WqDzRq2F9ouz5sVm4dvab0jxERbgJIsOBfgj4/edit?usp=sharing

Details of evaluating Surefire/NonDex with NOD/ID mutants

All Surefire results: drive.google.com/drive/folders/1u62UO2tTxI6lc7LZs7GdkL7a3eVQfx-P?usp=sharing
Example results: below are Surefire results running on NOD and ID mutants from commons-cli, each column is:

Project: project name

Sha: commit ID of the project

Test Class: test class that is running

Timestamp: a timestamp to map with logs

Threshold: thresholds to control flakiness, which are from 0.1 to 1

Build Result: Surefire result running on the current test class

Total Time: total time of Surefire running on the current test class

Runs: the total number of tests run from the current test class

Failures: the number of tests end up with failure from the current test class

Errors: the number of tests end up with errors from the current test class

Flakes: the number of tests detected as flaky tests from the current test class

Flaky Tests: flaky tests detected

surefire results on commons-cli

All NonDex results: drive.google.com/drive/folders/1-WQNJyen4nmE6XMhUNAhvnTD2bMukWL_?usp=sharing
Example results: below are NonDex results running on NOD and ID mutants from commons-cli, each column is:

Project: project name

Sha: commit ID of the project

Test Class: test class that is running

Timestamp: a timestamp to map with logs

Threshold: thresholds to control flakiness, which are from 0.1 to 1

Build Result: NonDex result running on the current test class

Total Time: total time of NonDex running on the current test class

Num of Test detected: the number of tests detected as flaky tests from the current test class

Tests detected: flaky tests detected

example nondex results on commons-cli

Details of evaluating iDFlakies with OD mutants

All iDFlakies results: drive.google.com/drive/folders/1m_gAhl4M6PdeQe5AAhTcH_CviRDfYwCb?usp=sharing
Example results: below are iDFlakies results running on DSD mutants from commons-cli, each column is:

Project: project name

Sha: commit ID of the project

Timestamp: a timestamp to map with logs

Build Result: iDFlakies result running on the current project

Total Time: total time of iDFlakies running on the current project

Num of Test detected: the number of tests detected as flaky tests from the current test class

Tests detected: flaky tests detected
cleanerCounts: number of cleaners, which are from 0 to 50

example idflakies results on commons-cli

Page updated

Google Sites

Report abuse