LESSON 06: atDNA SNPs and Segments

In the last lesson I stated an atDNA test looks at areas of our DNA where we tend to have differences. I want to underscore this because it is something that confuses people. Reporting the sequence of the whole human genome is costly. Since so much of the human genome is identical, it is more practical to do “spot checks” at various places where SNPs (mutations) are known to have occurred. Most of the time these SNPs are separated by a few thousand base pairs before you reach the next tested SNP but we often act as if they are adjacent. SNPs have the highest variability, between individuals and groups. All the base pairs in between vary little, or not at all, even between vastly different people.

I am going to introduce an analogy that I will come back to again and again, so listen up!

Think of your DNA as a very long string of blocks winding around the house out the door and down the street. If you look closely you'll see there are actually two strings, one resting right on top of the other. If we travel along the blocks out into the driveway and finally into the street where we finally hit the first SNP or blocks where we are likely to find variation. If our neighbor across the street was looking at their string of blocks they would be identical with ours until we met at that first SNP. Rather than test all these blocks that are the same we leave them out and just look at the SNPs. Then we travel along to find the next SNP and so forth.

Each chromosome has a different distribution of SNPs; some are very SNP rich and some not so much. So in our analogy one stretch of a chromosome may need to travel 1,000 blocks or base pairs until it hits a SNP and the other chromosome may need to go 2,000 blocks. In order to be able to compare the relevance of a segment of DNA we have a unit of measurement that takes into account how many SNPs that different segments contain and how likely a stretch of DNA denotes a likelihood of being genealogically significant. The measurement is called a centimorgan (we use the abbreviation "cM" so as not to be confused with centimeters "cm"). So a cM represents a stretch of DNA where we have left out all the blocks that tend to be the same and only tested those where SNPs occur. What's left is a string of SNPs.


Okay here's where things start to get more confusing. Remember I said the blocks were stacked one on top of each other? If you look at the blocks above they represent 5 SNPs (all the base pairs in between have been removed). If you look closely some blocks have the red block on top and sometimes the blue one. The value that is reported at any SNP comes half from your mom and half from your dad. So the red blocks are from mom and the blue ones are from dad. However when they are reported in your RAW DNA (data output of our DNA test) we do not know which comes from which. Our example in the last lesson:

rs4988235= AG = likely lactose intolerant

We do not know from which parent the "A" or "G" comes from. Likewise when we get a segment match of DNA with another we do not know which side we match on. When FTDNA and 23andme report our RAW DNA they have a protocol that lists the SNP in alphabetic order so we have no idea which value comes from which parent.


Here is where "phasing" comes into our block example. Phasing simply separates the red and blue blocks into Mom's side and Dad's side. The results from Ancestry uses a special program that attempts to "phase" or "pseudo phase" the SNPs into sides. Usually this works very well but occasionally this can have the unfortunate effect of making a "true match" look like a non-match by misattributing some of the values to the wrong parent. In the case of misattribution, a series of blocks may be on top (Dad's side) but actually be from mom. Ancestry currently uses base pairs rather than cMs as its standard of measurement. There is some discussion that cM is still a better measurement. Each way of looking at our segments has advantages and disadvantages.

There are several other ways we can phase DNA. We can use a program that attempts to separate mom and dad's contribution. Or if we are lucky enough to have mom and dad and a child we can compare the values that each parent contributes to the child and actually phase what came from where. Most people do not realize that when we have a DNA matching segment we only match half of our DNA with our match. Going back to our block analogy not only do we have red and blue blocks each block could be a "A," "C," "G," or "T." So a match represents a series of blocks where we are half identical. Only one of the two values at each SNP needs to match. The only time we are likely to encounter fully identical segments are in children who inherited the same segment from each parent for a given portion of a chromosome. Since the parents each have DNA from their two parents, when those parents give DNA to their children it can be from the same or different grandparents of the child in any given segment as long as one half comes from mom and one half comes from dad.


Here is a screenshot from 23andme comparing two of my children. The white areas on the chromosome is where they each inherited a different segment from each of their parents. For example: A.M. Wheaton may have gotten my mother's segment from me (maternal grandmother) and J. Wheaton my father's segment (maternal grandfather). Remember those two strings of blocks, each representing mom and dad. And from their dad A.M. Wheaton got his dad's dad segment (paternal grandfather) and J. Wheaton from his dad's mother's segment (paternal grandmother). So in these white stretches all 4 grandparents are represented because each child's two sides do not match the other child's two sides! In the blue segments one half of the DNA is the same, so they each may have inherited perhaps my mother's segment (maternal grandmother) but received different segments from their dad's parent's (one inherited the paternal grandmother and the other the paternal grandfather). And finally the black segments show where they got exactly the same segments from both parents. So perhaps my dad's segment from me (maternal grandfather) and their dad's dad from him (paternal grandfather). 

Remember each child receives half of their DNA from each parent who in turn got half from each parent. All children receive "approximately" 25% of their DNA from each of their four grandparents. The difference between siblings is they each get a different scrambled mix of segments. Have a look at the Visual DNA chart from the first lesson. And just imagine that the patterns of colored bars are randomly different. That's why children of the same set of parents are so different.

When you have grasped these concepts then you are ready to use your DNA for genealogical purposes and actually understand what you are doing! Yay! Don't be discouraged if you need to read this over a few times and occasionally refer back to it. 


Additional resources:

Tim Jantzen's chart of estimated relationship based on shared cM

How many genomic blocks do you share with a cousin? by the Coop Lab Population and Evolutionary Genetics  UC Davis

How Do DNA Segments get Smaller by Blaine Bettinger

c
Content copyright 2013. All rights reserved.

Lesson 07: atDNA Ancestral origins


Comments