R & C package for the paper
by Tianjian Zhou, Peter Mueller, Subhajit Sengupta and Yuan Ji.
[Download package] [Manuscript]
Tumour cell populations can be thought of as a composition of heterogeneous cell subpopulations, with each subpopulation being characterized by overlapping sets of single- nucleotide variants. Such subpopulations are known as subclones and are an important target for precision medicine. Reconstructing subclones from next generation sequencing data is one of the major challenges in computational biology. We present PairClone as a new tool to implement this reconstruction. The main idea of PairClone is to model short reads mapped to pairs of proximal single nucleotide variants, which we refer to as mutation pairs. In contrast, other existing methods use only marginal reads for unpaired single-nucleotide variants. Using Bayesian non- parametric models, we estimate posterior probabilities of the number, genotypes and population frequencies of subclones in one or more tumour sample. We use the categorical Indian buffet process as a prior probability model for subclones. Column vectors of categorical matrices record the corresponding sets of mutation pairs for subclones. The performance of PairClone is assessed by using simulated and real data sets with a comparison with existing methods.
This package contains the source files to run the MCMC algorithm in the paper, the simulation datasets and the lung cancer dataset described in the paper. See below for a detailed description for each file.
$ R CMD SHLIB PairClone_MCMC_PT.cpp
in the terminal to compile this file.
In order to run the simulation example: extract the package, go to the directory, compile "PairClone_MCMC_PT.cpp", run the code "PairClone_main.R" line by line in R console.