Selected alternatives to statistical inference in survival analysis: principles and some properties of methods comparing survival curves

Lubomír Štěpánek

Institute of Biophysics and Informatics, First Faculty of Medicine, Charles University

Faculty of Informatics and Statistics, Prague University of Economics and Business

Dept. of Statistical Modelling, Institute of Computer Science, Czech Academy of Sciences

Location: ZOOM

Meeting ID: 926 8545 1604

Passcode: 358199

Date: Wednesday 29 September 2021

Time: 13:30 CET

Background: A situation of comparing two or more time-event survival curves is very common in applied biostatistics. Therefore, several well-established methods could be used. Regarding how many groups are supposed to be compared, a log-rank test, a score-rank test, a Cox proportional hazards model, or even a Wilcoxon rank-sum test might be used. However, each of the described methods has its limitations, and its application is determined by meeting relatively rigorous statistical assumptions [1].


Objectives: In this work, we propose more robust alternatives for two of these methods: We refine the log-rank test using combinatorial geometry in a more robust way [2], and we introduce a new alternative for comparing two or more time-to-event survival curves, that uses a random forest algorithm, which is practically assumption-free [3]. Also, we test some of the properties using preliminary simulations.


Methods: Regarding the comparison of two survival curves, we propose a bit different, assumption-low framework on how to model individual time-to-event survival curves that are depicted in a discrete combinatorial way as orthogonal paths in a grid of survival plot, which, besides others, enables, by counting up the paths, a direct estimation of the p-value using its original definition of getting data at least the same extreme as the observed one. We also introduce an approach to comparing more than two survival curves using a random forest algorithm, which practically does not require any statistical assumptions. The repetitive generating of many decision trees covered by one random forest model enables calculating a proportion of trees with sufficient complexity classifying into all groups (depicted by their survival curves), which is close to a p-value estimate. The higher is the proportion, the more likely we would reject the null hypothesis claiming there is no statistical difference between the curves. A level of the pruning of decision trees the random forest model is built with, can modify both the robustness and statistical power of the random forest alternative to the Cox’s regression comparing survival curves.


Results: We further present simulations that support our expectation that both the proposed methods are more robust in lower first-type error rates than traditional approaches. In the case of the random forest algorithm, we demonstrate that with increasing the tree pruning level, the first-type error rate of the method decreases, and robustness increases.


Conclusions: Based on the simulations and preliminary analytical derivations, the methods seem to be promising alternatives for comparing two or more time-to-event curves. Finally, we discuss applications of our methods in biostatistics and other fields.

Acknowledgment: This work is supported by the grant OP VVV IGA/A, CZ.02.2.69/0.0/0.0/19_073/0016936 with no. 18/2021, which has been provided by the Internal Grant Agency of the Prague University of Economics and Business.

References:


[1] Armitage, P. (1959). The Comparison of Survival Curves. Journal of the Royal Statistical Society. Series A (General), 122(3), 279–300. https://doi.org/10.2307/2342794


[2] Štěpánek, L., Habarta, F., Malá, I., & Marek, L. (2020, September 26). Analysis of asymptotic time complexity of an assumption-free alternative to the log-rank test. Proceedings of the 2020 Federated Conference on Computer Science and Information Systems. 2020 Federated Conference on Computer Science and Information Systems. https://doi.org/10.15439/2020f198

[3] Štěpánek, L., Habarta, F., Malá, I., & Marek, L. A random forest-based approach for survival curves comparison: principles, computational aspects, and asymptotic time complexity analysis. Proceedings of the 2021 Federated Conference on Computer Science and Information Systems. 2021 Federated Conference on Computer Science and Information Systems. In press