The Czech National Group of the International Society for Clinical Biostatistics (ISCB Czechia)
Location: ZOOM
Meeting ID: 990 6218 4007
Passcode: 183786
Date: Friday 21 November 2025
Time: 14:00 CET
Abstract:
Simulation studies are widely used to evaluate the performance of statistical methods using synthetic data sets generated from a known ground truth. However, the current methodological research paradigm requires researchers to develop and evaluate new methods at the same time. This creates misaligned incentives, such as the need to demonstrate the superiority of new methods, potentially compromising the neutrality of simulation studies. Furthermore, results of simulation studies are often difficult to compare due to differences in data-generating mechanisms, included methods, and performance measures. This fragmentation can lead to conflicting conclusions, hinder cumulative methodological progress, and delay the adoption of effective methods. To address these challenges, we introduce the concept of living synthetic benchmarks.
The key idea is to disentangle method and data-generating mechanism development and continuously update the benchmark whenever a new data-generating mechanism, method, or performance measure becomes available. Such segregation improves the neutrality of method evaluation, puts more focus on the development of both methods and data-generating mechanisms, and makes it possible to compare all methods across all data-generating mechanisms and using all performance measures.
In this paper, we (i) outline a blueprint for building and maintaining such benchmarks, (ii) discuss technical and organizational challenges of implementation, and (iii) demonstrate feasibility with a prototype benchmark for publication bias adjustment methods, including an open-source R package. We conclude that living synthetic benchmarks have the potential to foster neutral, reproducible, and cumulative evaluation of methods, benefiting both method developers and users.
Keywords:
Evidence Synthesis, Method Benchmarking, Neutral Method Comparison, Systematic Review
Personal Home Page:
References:
Pawel, S., Kook, L., & Reeve, K. (2024). Pitfalls and potentials in simulation studies: Questionable research practices in comparative simulation studies allow for spurious claims of superiority of any method. Biometrical Journal, 66(1), 2200091.
Nießl, C., Herrmann, M., Wiedemann, C., Casalicchio, G., & Boulesteix, A. L. (2022). Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(2), e1441.
Heinze, G., Boulesteix, A. L., Kammer, M., Morris, T. P., White, I. R., & Simulation Panel of the STRATOS Initiative. (2024). Phases of methodological research in biostatistics—building the evidence base for new methods. Biometrical Journal, 66(1), 2200222.