LESSON 09: atDNA Matches

What are atDNA matches anyway? When we have matching DNA what does that really mean? Simply, matches are individuals whose segments (stretches) of DNA match us along a given chromosome. These matches are where we have a half match, otherwise referred to as a Half Identical Region (HIR, pronounced "her"), where for every SNP or base pair at least one of the two alleles matches one of our match's alleles for the length of the matching segment. An allele is simply the "A," "T," "C," or "G." (Review Lesson 5 if you are confused.)

Here is a very short part of a segment on Chromosome 16 showing the pair of alleles for each SNP:





















In this example John and Mary share a HIR and John and Pete share a HIR but Mary and Pete do not, because in the highlighted SNP they are not even half identical. (In reality a shared segment would include hundreds if not thousands of SNPs.) In this simplified example "John and Mary" AND "John and Pete" will share a common ancestor but it won't be the same common ancestor because Mary and Pete do not match. We have essentially phased the data (determined the two sides: mom and dad). John and Mary in our example are cousins: they have a shared ancestor on John's maternal side and John and Pete share a common ancestor on John's paternal side. Keeping track of who matches whom on which segments is a process called chromosome mapping (See resources in Lesson 11 for more information on chromosome mapping.)

A segment is considered "Identical By Descent" or "IBD" when the length either in cM or base pairs is long enough to denote that two individuals are "likely" to be descended from a common ancestor from whom they each inherited the same segment. When comparing two individuals who share a common grandparent (first cousins) the matching segments will only represent one side of the segment (one allele in each pair). The shared grandparent (let's say maternal) represents the shared side, and the other half of the segment is inherited from one of the paternal grandparents (and is not shared).

Each of the three companies: Ancestry, FTDNA and 23andMe have different criteria for determining matches. This is why you may have a match at one company and not have a match with the same person at another. Each company tries to include the highest level of true matches while excluding many false matches. This is based on a statistical analysis and somewhat of a cost benefit ratio (the benefit of including false matches versus the risk of losing real matches). 

The following represents the match criteria by Company:

FTDNA states in their FAQ:

"The Family Finder program declares a DNA segment to be Identical by Descent (IBD) if it contains at least 500 matching SNPs (Single Nucleotide Polymorphism) in series. For the program to consider two people a potential match, the largest matching DNA segment between two people must be at least 5.5 centiMorgans (cM) long. The program then uses additional matching segments to confirm the relationship and to calculate the degree of relatedness. Based on the extensive Family Finder database, it is rare for two genuine genealogical cousins to have a largest shared segment of less than 7 cM and one less than 6 cM is exceptional."

ANCESTRY as reported by CeCe Moore

"The minimum threshold for matching is 5 megabase pairs. There is no minimum SNP requirement. " These results are pseudo phased thus allowing a smaller size to be relevant. (They do not include matches on the X.)

23andMe as reported in the ISOGG Wiki requires the following thresholds

  • Autosomal: 700 SNPs, 5 cM

  • X (male vs male): 200 SNPs, 1 cM

  • X (male vs female): 600 SNPs, 6 cM

  • X (female vs female): 1200 SNPs, 6 cM

In spite of these criteria the matches at the lowest levels are still subject to an estimated 40-60% false match rate sometimes referred to as pseudo matches or "Identical by State" (IBS). These are segments that are NOT from a common ancestor but just happen to be a HIR by random chance or because they represent a very old geographical shared ancestry. The smaller the segment the more likely this is to be true. The corollary is still worth noting that 40-60% of these small matches will be IBD "Identical by Descent" and thereby TRUE matches. If you already have a known match with someone it is often helpful to look at segments (this can be done down to 1 cM at GEDMATCH and FTDNA). These small segments may give you clues as to how you might connect with other matches or if they represent minority admixture may be clues to ancient ancestry.

In reality the vast majority of genetic genealogy is based on statistics. Statistics is a major source of the misunderstandings and frustrations for beginning genetic genealogists. When you are dealing with thousands of ancestors or thousands of matches some of them are going to fall into the tail ends of the familiar "bell curve." So in spite of the fact that the vast majority of say brother to sister matches are going to fall at the 49% average shared DNA level, there will be ones that share 40% and some that share nearly 60%---unusual but not impossible. So whenever looking at DNA remember my "unusual but not impossible" maxim. Sometimes a match is a match and sometimes it isn't. Statistical predictions are the roadways on which most of us travel. But every now and then people go off-road and take a detour. In spite of the science, that's as precise as we get. The more transactions (DNA exchanges) or generations we look at the more extreme the ranges become. 

Paddy Waldron  states "On average and in theory, an unbiased coin toss produces a head 50% of the time and a tail 50% of the time. In a single trial just now, my coin produced a tail (i.e. 100% of my sample of size 1 resulted in tails). Similarly, on average and in theory, a pair of first cousins share 12.5% of their DNA. In your trial (also a sample of size 1), an observation of exactly 12.50000% is about as likely as the coin landing balanced on its edge. The single observation will be somewhere close to the average. A lot of those commenting above seem not to be distinguishing between the average outcome of many random experiments and the actual outcome of one single random experiment. Inheritance of autosomal DNA is just a random experiment, as recombination occurs randomly along the 22 autosomal chromosomes. A large sample (say two families of 10 children all testing, giving 10x10=100 first cousin matches) will produce a sample average much closer to the theoretical average than the samples of size 1 which are typical in genetic genealogy."

In that vain note:

Steve Mount states "The probability that fourth cousins share at least one IBD [identical by descent] segment is 77%, and the expected length of this segment is 10 cM." Now consider the next step. There is a 50% chance that that one shared segment will not be transmitted at all, but a 90% chance that if it is transmitted it will be just as big as it was (the same 10 cM.)"

Here's a look at each company's match reporting.

ANCESTRY does not show us the segments or the number of matching base pairs; they simply give us a list of matches and the anticipated relationship and margin of error. (Note: Names have been scrubbed, please click on image to show detail.) The predicted relationship is given, plus a range of relationship and a confidence level. User name is listed and contact can be made via Ancestry's message system.

Family Tree DNA FTDNA gives a nice summary including relationship range, suggested relationship, shared cM and longest block (which is the length of the longest segment). Matches are identified by name and contact email is given 
if available. Links are given to on-line trees and lists of surnames (not shown).

23andMe shows the relationship, percentage shared and number of segments as well as haplogroups and other information shared by your matches in an easy to read format. They pack lots of info into a small space. Most of the matches will be private unless they accept a share although a small portion are public.

As I have said before each company has its advantages and disadvantages. And in many circumstances a combination of all three is the best choice for those serious minded genetic genealogists who want to take full advantage of what each has to offer. Ancestry's simplicity and pseudo phasing and attached trees probably has the best chance of making genealogical connections. Power users may insist on "seeing" the matching segments and being able to do Chromosome Mapping. Some will like the value at 23andme and discount pricing for multiple kits ordered at the same time. Still others will like the chromosome matching tool at FTDNA that lets you see matching segments down to 1 cM or being able to combine atDNA with YDNA or mtDNA in one place.

HOW MUCH DNA from your ancestors

Adapted from Tim Jantzen (Average estimates---the actual amounts vary broadly)

parent contributes ca 3546 cMs

grandparent contributes ca 1773 cMs

great-grandparent contributes ca 886 cMs

2nd great grandparent contributes ca 443 cMs

3rd great grandparent contributes ca 222 cMs

4th great grandparent contributes ca 110 cMs

5th great grandparent contributes ca 56 cMs

6th great grandparent contributes ca 28 cMs

7th great grandparent contributes ca 14 cMs

8th great grandparent contributes ca 7 cMs

Additional Resources

        atDNA percentages from Gliesian

        Relationship Calculator from Robert James Liguori

        The Limits of Predicting Relationships with DNA Leah LaPerle Larkin

        Julies Story (more on Relationship Prediction) Leah LaPerle Larkin

FTDNA FAQ Great charts and resources for your "Cheat Sheet File"

Autosomal DNA Testing 101: What Now by Roberta Estes

DNA Portraits: Second Cousins by Jim Owston. Nice discussion of actual stats on cousin-ship

Investigating Small Segments of DNA with HIR search by CeCe Moore

Why Are My Predicted Cousin Relationships Wrong? by Roberta Estes

Content copyright 2013. All rights reserved.

LESSON 10: atDNA More with Matches