Research

My main goal is to create interpretable and computationally efficient models for large complex data that give a better understanding of real world problems, in particular in biomedicine, and help to provide fast accurate decisions. I develop statistical methods for large heterogenous data, mainly leveraging on Bayesian and probabilistic machine learning algorithms, and focusing on data integration.

factor analysis for Data integration

I create Bayesian methods for high-dimensional and heterogeneous data integration using latent factor models.

  • I developed sparse Bayesian factor regression models. My key contributions are:

      1. I created model-based methods tha jointly (i) adjust for systematic biases, such as batch effects, and (ii) provide a low- dimensional representation of the data.

      2. I also proposed a novel class of non-local sparse inducing priors to facilitate interpretation and improve accuracy.

      3. I created computationally efficient algorithms that can be used in real-world settings and are publicly available here.

  • I am building sparse Bayesian multi-study factor regression models that, in a single analysis, integrate large datasets, learn covariate effects, and provide sparse low-rank covariances per each study.

  • I am (and will keep) developing computationally fast and efficient methods for multi-study factor analysis, that build on recent advances on: machine learning, such as Variational Bayes methods, and MCMC methods.

graphical models for heterogenous data

I am creating (or planning to do in the near future) novel Bayesian graphical models to integrate complex data coming from different groups or populations and jointly infer different network structures for each group.

  • I am designing Bayesian graphical models that learn the common sparsity structures while allowing for group heterogeneities. I am creating such tools leveraging sparse-inducing priors and Gaussian hierarchical models.

  • I am developing Bayesian models that jointly estimate multiple graphical models for non-Gaussian data, such as binary data, coming from distinct subpopulations.

External data integration in Bayesian adaptive designs

My research activities include designing, developing and implementing novel methodologies for regulatory science applications.

  • I proposed tools to evaluate predictive models in clinical trials, using an approach inspired by the statistical machine learning literature

  • I am designing ethical studies leveraging Bayesian model selection tools, that

      1. require less patients to be assigned to the standard of care (in comparison with traditional randomised control trials),

      2. randomise more patients to the novel treatments,

      3. stop non-promising trials early, and

      4. incorporate new experimental therapies at any time during the study.

Regulatory and policy-making research

I am part of two multi-disciplinary advisory committees:

  • SANEST (convined by the U.S. FDA), where I estabilished guideliness for data integration, misclassification, and missingness methods for patients with blocked or narrowed coronary and peripheral arteries treated with paclitaxel-coated devices. I also showed that methods that do not properly adjust for the differences in populations provide biased estimations of the probabilities of surviving five years after treatment.

  • PANFAB, where I provided recommendations on the selection of disinfectants against SARS-CoV-2