Our CCG research group focuses on the design and development of computational methods to analyse tumour DNA sequencing data for understanding different cancer evolutionary processes. Particularly, one of our key goals is the understanding of the evolutionary histories and migration patterns of metastatic cancer cells. To achieve this goal, we focus on the analysis of the cutting-edge single-cell data that we generate using single-cell whole-genome sequencing technologies, which enables the whole-genome sequencing of thousands of individual cells in parallel from the same tumour. The novel features of these revolutionary data have the potential to reveal novel cancer evolutionary and metastatic insights at unprecedented resolution, but they make their analysis particularly challenging. To address the challenges of this new era of cancer genomics, we develop computational methods that are based on rigorous mathematical models and algorithms. Combining innovative biological technologies with formal consolidated methodologies is the key to enable us to address these important unanswered questions and identify the related opportunities for clinical translation.
Recent sequencing technologies provide an effective way to investigate the cancer evolutionary process, with critical impact on both diagnosis, prognosis, and treatment. Therefore, a large amount of multi-dimensional and multi-omics data are routinely produced by cutting-edge technologies, including single-cell, bulk, and spatial sequencing as well as liquid biopsies and sequencing of metastases. While these data offer an unprecedented view of tumor evolution, the combination of such technologies with formal consolidated methodologies is the key to realize their full potential.
The goal of our lab is to design and develop computational methods that leverage the features of the most recent sequencing technologies to investigate the complex tumor evolution and heterogeneity. Such methods are thus based on rigorous mathematical models and algorithms that we specifically design for analyzing the data produced by different technologies. By framing biological questions as computational problems, we enable the integration of multiple sources of information into the solutions, revealing novel insights about the cancer evolutionary process.
Tumours are heterogeneous compositions of distinct subpopulations of cancer cells that represent different stages of tumour evolution, and that are characterised by the accumulation of different genetic alterations, including single-nucleotide variants (SNVs), copy-number alterations (CNAs), and structural variants (SVs)23. Using these alterations, tumour phylogenies can be reconstructed to investigate the role of these subpopulations and their alterations. However, standard phylogenetic methods cannot be applied to scDNA-seq data due to the low sequencing coverage, resulting in high rates of errors and missing data24. Moreover, existing methods do not integrate different types of genetic alterations, limiting our understanding of tumour evolution19. To overcome these limitations, we are developing single-cell-specific methods that integrate signals across thousands of single cells using statistical approaches and leveraging different genetic alterations, including key cancer events (e.g., whole genome duplications and mutation losses), which cannot be accurately characterised without scDNA-seq. Additionally, we are developing probabilistic methods that provide multiple equally plausible solutions instead of a single most-likely solution. This probabilistic approach is crucial in the high-error context of scDNA-seq as it significantly increases power when seeking recurrent patterns and evaluating alternative trajectories.
Tumour phylogenies provide ancestral information about the evolutionary history of a tumour, but they do not provide phenotypical information about distinct subpopulations of cancer cells. Since recent scDNA-seq data provides a signal to identify cell cycle states19, we are developing novel algorithms (e.g., SPRINTER) to estimate subpopulation-specific proliferation rates by identifying the fraction of actively replicating cells in each subpopulation. We are using these methods to investigate the role of proliferation in metastatic progression and to characterise the most aggressive subpopulations. In addition to genetic alterations, non-genetic alterations can also play a key role in driving cancer progression25. Therefore, we are also developing algorithms to identify replication timing alterations (RTAs), which are highly associated with changes in gene expression and chromatin structure26-28, and to reconstruct their evolutionary dynamics by mapping RTAs to the reconstructed tumour phylogeny.
The complex migration patterns that disseminating cancer cells undergo during the process of metastatic seeding are not observed directly, but they can be inferred using the reconstructed tumour phylogeny5. This is similar to inferring human migrations: if my parents are in Italy and I am currently in the UK, we can reasonably guess that I must have migrated from Italy to the UK. While preliminary algorithms have been developed to implement this concept for bulk sequencing data5,6,14, these methods are not applicable to scDNA-seq datasets comprising thousands of individual cancer cells from heterogeneous populations. Thus, we are developing methods to reconstruct metastatic migration patterns from scDNA-seq datasets, leveraging multiple types of somatic alterations (SNVs, CNAs, and SVs) and using probabilistic approaches to evaluate alternative and equally plausible patterns of cancer cell migrations.
The timing of metastatic seeding (i.e., how many months before/after diagnosis did different cells disseminate?) is generally unknown because this complex process can start years before metastases become detectable through clinical imaging14. However, certain types of mutations (e.g., clock-like mutations) can be used as molecular clocks based on the analysis of mutational signatures29,30. Therefore, we are developing novel methods to measure mutational signatures in individual cancer cells and use them to estimate the timing of metastatic evolutionary events (e.g., pre- vs post-metastatic seeding) by integrating the reconstructed tumour phylogenies.
The new dataset with 20 patients does not provide sufficient power to identify recurrent evolutionary patterns that might represent hallmarks of metastasis. However, our analyses provide a candidate list of metastatic features that can be tested in larger cohorts. Therefore, we are developing integrative statistical methods to perform such tests in the TRACERx cohort14,22, which includes 842 patients with non-small cell lung cancer (NSCLC) with available clinical annotations and multi-region bulk-sequencing data. While de novo identification of most features inferred in this single-cell analysis is infeasible from bulk-sequencing data, assessing their presence in bulk-sequencing data is easier and can be feasible.
All the methods developed above are neither specific to NSCLC nor to the newly generated dataset; rather, they form a platform that can be leveraged to address related questions in other cancer types. Therefore, we are extending and applying these methods as part of other large collaborations. For example, we are applying these methods to investigate treatment resistance and the role of homologous recombination deficiency in an autopsy cohort of patients with prostate cancer31 (collaboration with Prof. Gert Attard). Additionally, we are adapting the developed methods to investigate the timing of cancer cell invasion in patients with glioblastoma and related models32 (collaboration with Prof. Simona Parrinello).