McDonald Analysis

This is a Bio Geographical Analysis of the 23andMe autosomal Genome analysis of Sereno Barr-Kumar, done by Doug McDonald.  Doug McDonald is former professor of  Physical Chemistry at the University of Illinois at Urbana-Champaign.  The first observation of vibrational-electronic quantum beats was made in his laboratory. As he says this work on Bio Geographical Analysis is neither supported nor endorsed by the University.

1) Quantitative Results

Most likely fit is 

A list  of possible sets, one set per line, of possible ancestry. Some people don't get this list if they test  as, within an error tolerance, 100% a certain ancestry.  Each line represents the computer algorithm’s guess as to the test person’s makeup. The most likely sets in a statistical sense are at the top. However, this does not take into account what is called “prior knowledge”.

Europe: 8.6%  (+-  0.3%)  (various subcontinents)

S. Asia: 91.4% (+-  0.3%) (all India)

The following are possible population sets and their fractions, most likely at the top

Basque= 0.089  S_India= 0.911 or

Sardinia= 0.083  S_India= 0.917 

2)  Chromosome Segment Painting:

This feature gives a “painting” of the testee’s 22 autosomes and the X chromosome which are "color coded" to show the "continental" ancestry found at places along them. This test is very important for people with small amounts of ancestry from a “continent” differing from their major ones. It can detect, just for example, as small as 0.4% Native American in a pure European or a mixed Afro-(Euro)American person even if the test described in #1 above cannot. Such results will be mentioned in the text. In the specific case mentioned above the American comes out of the European part. 3) Average Location of All Ancestry Here the data

(as computed from number one above) is plotted as a summary. The green spot represent the average location of all ancestry except Native American and Oceanic. Note that for people with significant sub-Saharan African ancestry this green spot is essentially meaningless. But for people with just European, Mideast, and sometimes South Asian ancestry it is usually quite accurate as far as their average ancestry goes. The “chromosomes” plot just shows which geographic region each part of each chromosome is most likely to have come from. Since everyone has two copies of each chromosome (except males for the X), they may be split in half. But note that while distant regions such as Europe and Africa split cleanly, nearby ones like Europe and the Mideast don’t.  As a result often southern Europeans will show some Mideastern spots without having recent ancestry from the Mideast. Mideasteners, especially from North Africa often have sub-Saharan African regions, as do Mediterranean Europeans. Small regions (< 2 Mb) are often just “noise. I have special tests that can sometimes tell whether they are or not, and that will reported in the text if significant. Small regions may not show on the “quantitative” lists and will also be discussed if significant. It is important to remember that this plot cannot tell father from mother in any way.

4) PCA scatter plots

These plots are labeled with comparison populations and a crosshair icon representing the testee.  This is called the PCA (Principal Component Analysis) "scatter" plot. See below for how to read these. The computer generates several dozen of these, and I send only ones of interest for whatever makeup the test person is. There are a couple of special ones that test for specific ethnic ancestry. The most common of these is for Jewish, but others exist for such ethnicities as Sardinian, Kalash, Pygmy, and certain Native American tribes. None exist, unfortunately, for East Asian ethnic people.

The PCA “scatter” plots are the result of a calculation across the whole genome (all the chromosomes except the X) and report the average ancestry. If a person’s ancestry is from one place that will be reflected on them directly. There exist dozens of possible plots but only the most informative ones are sent. In Europe and the Mideast the position on the main plot directly indicates geography. Otherwise, it is indirect. This is taken into account when the spot(s) on the map plot are calculated.The map plot shows each continent’s average location, as well as an overall average (green) excluding the American and Oceanic ancestry. Mixed Ancestry and Related Details Now if someone is of mixed ancestry, as said earlier, the spot(s) on the map, as well as PCA plots indicate the average. The quantitative lists (which sometimes are not sent if it is just one line like the common “English 100%”) represent possible combinations which average to the correct spot on all the PCA plots. Each line in the list is one set of possible combinations. We fit to 9 of the PCA plots, including ones sent and unsent.  This is a “least squares fit”. A person, to fit, has to have the same weighted average position on all 9. Thus, say somebody lies in Romania on a Euro and Mideast only “main” one. They might actually be Romanian, or could be 60% Hungarian and 40% Jewish, or 75% French and 25% Iranian, or something more complicated. On ot

her of the plots (not usually sent) however, Romania does not lie directly between Hungary and the Jews or between France and Iran, so the program (or an experienced person simply looking at a pile of plots) can tell what is correct. Well, most of the time it can but not always. For these cases, especially Jewish ancestry, I sometimes have separate special tests, whose results are reported in words. People may wish to know  “how far back in time do my tests go?” The answer is “forever”, in the sense that my tests all discuss single or mixed modern populations. For example modern populations such as North African ones can be mixtures several older ones (such as middle eastern and sub-Saharan African). This ancient mixture will appear on the chromosome plots. A similar situation exists with the Jews who are, on a long time scale, mixed Euro-Mideast. They will usually show spots of both Euro and Mideast on the chromosomes, even though we classify Jews as Mideastern. If your parents are a typical English person and a typical Jew, that mix likely will appear on the list. But a mix of  Irish, British, Romanian, and Palestinian could also appear, since these two possibilities share the same ancient roots. The spot on the map would be similar. Small or modest amounts of, for example, Mideast or Eastern European in a basically English person can appear in the list to simply indicate that the person is from a point on the map east of England, such as Normandy, Germany, or Poland. Similar things happen on and across all continents, even Euro-East Asian.  For example 97% European and 3% Han Chinese might simply be indicating a small eastward displacement on the map of the European component. If such a mix is truly recent, it almost always appears as such on the chromosomes. If ancie nt, it may or may not appear there. This fact is always very important to remember when interpreting the “quantitative” data listing.

Technical notes

The PCA plots use about 300,000 SNPs to create a (real symmetric) distance matrix which has dimension 200-1200 depending on how many populations are used. This is diagonalized, and the eigenvectors are what are plotted on the PCA graphs. The quantitative results are a fit to the first 9 "dimensions" of the full 1200x1200 matrix. The best few least squares fits to these

are reported as the “quantitative”, with the proviso that the least number of  continental groups needed to "adequately" fit you is used. "Adequate" means that, averaged over the 9 dimensions, you fit within the area (actually 9-D volume) of a "typical" comparison population. The "continental" spots on the map are the average of the quantitative list, but more may be averaged than are sent in some cases. The green spot on the map is the average of the best 4-continental group fits and is usually (slightly) more accurate. This 4-population fit is seldom reported because it usually contains several populations with really tiny amounts.

The chromosome plots are an entirely separate calculation. They are a "maximum likelihood" estimate of the best two-continent fit over a sliding window using (for the autosomes) a Viterbi algorithm decision process.  The "emission probabilities" of this are the probabilities of fitting 12-marker haplotypes. The block size for the "cut points" is 4 markers even though the average is over dozens of markers. The X is painted using fixed 12 marker blocks using a simple sliding window average, not Viterbi, and is less accurate. The computer program used is written by me and is not any of the “standard” ones such as “Structure”.

It should be emphasized that the accuracy of the results depends significantly on how representative the reference samples are in relation to the testee’s ancestry.  We use some 90 groups of reference samples, all from academic studies. Not all available groups are used because for some of the analysis methods using too many groups can cause mathematical problems, and in a few cases the results appear simply silly when certain pairs are used.

Prior Knowledge

However, this does not take into account what is called “prior knowledge”. The classic example of this is with Afro-(Euro)Americans: the computer  quite frequently places Hungary or Romania at the top of the list, and England and Ireland below these, sometimes even too low to appear on the list it automatically send to people.  But we know that most such people have European ancestors from Western Europe, which is “prior knowledge”. This means that in reality, all things considered (and you really need to consider all the info you have) , Hungarian is less likely than English unless you suspect Hungarian. There are other such cases, especially relating to Jewish versus Italian or Mediterranean ancestry. I provide written comments in these cases. Note especially that we lack specific comparisons for Native American ancestry from the USA and Canada east of the Rocky Mountains; people with such ancestry, as well as people from the Caribbean and central Mexico (Aztec, etc.), come out as “Maya”.

 

The list of populations we use as comparisons is