Tumor Heterogeneity

During tumor growth, tumor cells acquire somatic mutations that allow them to gain advantages over time compared to normal cells. As a result, tumor cell populations are typically heterogeneous consisting of multiple subpopulations with unique genomes. This is known as tumor heterogeneity. The homogeneous subpopulations are known as subclones and are an important target in precision medicine. We proposed a Bayesian feature allocation model to reconstruct tumor subclones using next-generation sequencing (NGS) data. The key innovation is the use of (phased) pairs of proximal single nucleotide variants for the subclone reconstruction. We utilized parallel tempering to achieve a better mixing Markov chain with highly multi-modal posterior distributions. We also developed trans-dimensional MCMC algorithm with transition probabilities that are based on splitting the data into training and test data sets to efficiently implement trans-dimensional MCMC sampling. Through simulation studies we showed that our model outperforms some earlier models by recovering the number of subclones as well as their structures more accurately. Applying our model to 30 pairs of head and neck cancer data, we successfully inferred their subclone structures and showed that tumor samples are generally more heterogeneous than their corresponding normal samples.






Tumor evolution & concept of mutation pairs

Selected results:

Inferred number of subclones for tumor (in red) and matched normal (in blue) for 30 pairs of head and neck cancer data.

Inferred genotypes for randomly selected 6 pairs of head and neck cancer data.

A second research direction addresses another important aspect of statistical inference for tumor heterogeneity, aiming to recover the potential phylogenetic relationship of subclones. Such inference can enrich our understanding significantly on subclone evolution and cancer development. We developed a tree-based feature allocation model, which explicitly models dependence structure among subclones. We adapted our MCMC sampling techniques to efficiently search the tree space. We analyzed a lung cancer data set and inferred the underlying evolutionary process.