2025 IMS International Conference on Statistics and Data Science (ICSDS)
December 15-18, 2025, Seville, Spain
2025 IMS International Conference on Statistics and Data Science (ICSDS)
December 15-18, 2025, Seville, Spain
Veridical Data Science towards Trustworthy AI
Bin Yu University of California, Berkeley, USA
Monday, December 15th
Abstract:
In this talk, I will introduce the Predictability–Computability–Stability (PCS) framework for veridical (truthful) data science, emphasizing its central role in generating reliable and actionable insights. I will present success stories from cancer detection and cardiology, where PCS principles have guided cost-effective study designs and improved outcomes. Because trustworthy uncertainty quantification (UQ) is essential for trustworthy AI, I will then focus on PCS-based UQ for prediction in regression and multi-class classification. PCS-UQ follows three steps: prediction check, bootstrap, and multiplicative calibration. Across 26 benchmark datasets, PCS-UQ outperforms common conformal prediction methods in interval width, subgroup coverage, and subgroup interval width. Notably, the multiplicative calibration step in PCS-UQ can be viewed as a new form of conformal prediction. I will conclude with a discussion of PCS-guided constructive approaches for building more trustworthy statistical models, along with available PCS resources.
Short Bio:
Bin Yu is CDSS Chancellor's Distinguished Professor in Statistics, EECS, Center for Computational Biology, and Senior Advisor at the Simons Institute for the Theory of Computing, all at UC Berkeley. Her research focuses on the practice and theory of statistical machine learning, veridical data science, responsible and safe AI, and solving interdisciplinary data problems in neuroscience, genomics, and precision medicine. She and her team have developed algorithms such as iterative random forests (iRF), stability-driven NMF, adaptive wavelet distillation (AWD), Contextual Decomposition for Transformers (CD-T), SPEX and ProxySPEX for interpreting deep learning models, especially for compositional interpretability.
She is a member of the National Academy of Sciences and of the American Academy of Arts and Sciences. She was a Guggenheim Fellow, President of Institute of Mathematical Statistics (IMS), and delivered the Tukey Lecture of the Bernoulli Society, the Breiman Lecture at NeurIPS, the IMS Rietz and Wald Lectures, and Distinguished Achievement Award and Lecture (formerly Fisher Lecture) of COPSS (Committee of Presidents of Statistical Societies). She holds an Honorary Doctorate from The University of Lausanne. She is on the Editorial Board of Proceedings of National Academy of Science (PNAS) and a co-editor of the Harvard Data Science Review (HDSR).
Recent advances in uncertainty quantification: anytime guarantees and multivariate predictions
Francis Bach Ecole Normale Supérieure, France
Tuesday, December 16th
Abstract:
Quantifying uncertainty in statistics and machine learning is crucial, but challenging in high-dimensional prediction problems. Probabilisitic calibration and conformal prediction have emerged as key practical theoretically well motivated frameworks. In this talk I will present recent advances that allow greater flexibility in their applications, in terms of anytime guarantees and applications in multivariate prediction problems beyond univariate regression and binary classification.
Short Bio:
Francis Bach is a researcher at Inria, leading since 2011 the machine learning team which is part of the Computer Science department at Ecole Normale Supérieure. He graduated from Ecole Polytechnique in 1997 and completed his Ph.D. in Computer Science at U.C. Berkeley in 2005, working with Professor Michael Jordan. He spent two years in the Mathematical Morphology group at Ecole des Mines de Paris; then he joined the computer vision project-team at INRIA/Ecole Normale Supérieure from 2007 to 2010.
Francis Bach is primarily interested in machine learning, and especially in sparse methods, kernel-based learning, neural networks, and large-scale optimization. He published the book "Learning Theory from First Principles" through MIT Press in 2024.
He obtained in 2009 a Starting Grant and in 2016 a Consolidator Grant from the European Research Council, and received the INRIA young researcher prize in 2012, the ICML test-of-time award in 2014 and 2019, the NeurIPS test-of-time award in 2021, as well as the Lagrange prize in continuous optimization in 2018, and the Jean-Jacques Moreau prize in 2019. He was elected in 2020 at the French Academy of Sciences. In 2015, he was program co-chair of the International Conference in Machine learning (ICML), general chair in 2018, and president of its board between 2021 and 2023; he was co-editor-in-chief of the Journal of Machine Learning Research between 2018 and 2023.
Data thinning and beyond
Daniela Witten University of Washington, USA
Wednesday, December 17th
Abstract:
Contemporary data analysis pipelines often involve the use and reuse of data. For instance, a scientist may explore a dataset to select an interesting hypothesis, and then wish to test this hypothesis with the same data. From a statistical perspective, this double use of data is highly problematic: it induces dependence between the hypothesis generation and testing stages, which complicates inference. Failure to account for this dependence renders classical inference techniques invalid.
I will present "data thinning", a set of strategies for obtaining independent training and test sets so that the former can be used to select a hypothesis, and the latter to test it. Data thinning enables valid selective inference in settings for which no solutions were previously available. However, it is also restrictive, in the sense that it requires strong distributional assumptions. Therefore, I will also present two strategies inspired by data thinning that enable valid post-selection inference without such assumptions. One strategy considers thinning summary statistics of the data, rather than the data itself, in order to take advantage of asymptotic properties of the summary statistics. The second strategy involves generating training and test sets that are not independent, and then orthogonalizing the latter with respect to the former in order to conduct valid inference.
This is joint work with Ethan Ancell, Jacob Bien, Ameer Dharamshi, Lucy Gao, Dan Kessler, Anna Neufeld, Snigdha Panigrahi, and Ronan Perry.
Short Bio:
Daniela Witten is a professor of Statistics and Biostatistics at University of Washington, and the Dorothy Gilford Endowed Chair in Mathematical Statistics. She develops statistical machine learning methods for high-dimensional data, with a focus on unsupervised learning.
She has received a number of awards for her research in statistical machine learning: most notably the Spiegelman Award from the American Public Health Association for a (bio)statistician under age 40, and the Presidents’ Award from the Committee of Presidents of Statistical Societies for a statistician under age 41.
Daniela is a co-author of the textbook "Introduction to Statistical Learning", and since 2023 serves as Joint Editor of Journal of the Royal Statistical Society, Series B.
Title To Be Announced
Richard Samworth University of Cambridge, UK
Thursday, December 18th
Abstract:
TBA
Short Bio:
Richard Samworth obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a full professor in 2013 and the Professor of Statistical Science in 2017. His main research interests are in high-dimensional and nonparametric statistics; he has developed methods and theory for shape-constrained inference, missing data, subgroup selection, data perturbation techniques, changepoint estimation and independence testing, among others. Richard served as co-editor of the Annals of Statistics (2019-2021), was elected an IMS Fellow (2014), gave an IMS Medallion lecture (2018) and received the IMS Grace Wahba Award and lecture (2025). He currently holds a European Research Council Advanced Grant, received the COPSS Presidents’ Award in 2018, was elected a Fellow of the Royal Society in 2021 and was awarded the David Cox Medal for Statistics in 2025.