A method for local ancestry deconvolution

Copyright 2009 by the Cornell University and the Cornell Research Foundation, Inc.  
Copyright 2011 The Board of Trustees of the Leland Stanford Junior University.
Recent Updates

August 24, 2011 - Updated version of PCAdmix (1.0) now available. 

July 22, 2011 - Initial site for the method is created and a R/C++ version of the code is provided.  Look back soon for an updated version of the method written completely in C which contains more options.

Brief Description of Method

PCAdmix is a method that estimates local ancestry via principal components analysis (PCA) using phased haplotypes.  The method considers data chromosome by chromosome.  First, it performs PCA on the 2 or 3 reference panels provided building a PC space, say for chromosome W.  Since the method uses phased data, each copy of chromosome W among the reference panels is considered as a separate data point in PC space.  The first 1 or 2 principle components (PCs) tend to represent the axes of ancestral divergence between the reference panels as can be seen in the PCA plots.  If this is not the case in your analysis, this may suggest there is cryptic structure in your reference panels.  Given this PC space, the query panel of admixed individuals are then projected in to the space and PC loadings for each SNP are collected.  Next, the method proceeds looking at short windows of SNPs and assessing the probability that a given window of an admixed individual’s haplotype comes from each reference population.  Details of this process are described in Dr. Abra Brisbin's PhD thesis (see reference page for a link).  Given the probability of each ancestry for each window, the final step of the method uses a hidden Markov model (HMM) to smooth out the window-based ancestry calls.  This is important because the window information is often noisy including windows without confident ancestry calls.  The HMM relies on the PC scores, the proportion of each ancestry on a given chromosome and a transition matrix that depends on the number of generations since admixture and the recombination distance between any two windows.  In the current version of the method, the recombination distances and the number of generations are fixed parameters.  In practice, the results appear to be fairly robust to the number of generations chosen.  The end result of running the HMM is a matrix containing the posterior probability that a given window is of a given ancestry, conditioned on the rest of the data for a chromosome.  This is the standard output from the forward-backward algorithm.  Plots are made based on these results by calling a window ancestry A if the posterior probability for ancestry A is >= 0.8.  Any window for which the maximum posterior probability of any ancestry is < 0.8 is called “undecided”.