A Practical Guide to Using Autosomal DNA for Genealogy


A Practical Guide to Using Autosomal DNA
for Genealogy

By Rebecca Bryant

27 July 2015


Overview

Can you use autosomal DNA (abbreviated as atDNA) to break through brick walls? Absolutely. You can also validate or disprove questionable parts of your pedigree. In fact, if you correctly use atDNA -- and few people are -- you can reach an equivalent of the genealogical proof standard. 

Introduction

There's plenty of information online about atDNA but little of it was contributed by people with experience that is both significant (thousands of hours) and relevant (working IBDs). This guide, in contrast, is founded on exactly that kind of experience.

First, I'll cover some of the basics of genetic genealogy. Then, I'll explain IBDs and teach you how to solve them. Three points before proceeding:

1. This article does not cover all the intricacies of using autosomal or atDNA. Rather, it provides a blueprint for building skills.

2. Each and every point made below is important, so you'll need to review this article multiple times, over the next several months, while working IBDs. Sorry but there is no alternative path to competence. No amount of reading or attending conferences will compensate for the hard work described below.

3. I revise the article often, so link or bookmark to it, rather than copying.

Background information: What you need to understand before starting

1.      atDNA, is different from yDNA or mtDNA. Understanding the difference is essential.[1]

2.     FamilyTreeDNA.com (FTDNA) offers all 3 kinds of DNA testing; their atDNA brand is called "Family Finder." 23andMe also does all 3 kinds, but only a small percentage of people use it for genealogical purposes. Ancestry does only atDNA testing. 

3.      Test on FTDNA. (Disclosure: I have NO relationship whatsoever with the company.) Some people promote Ancestry.com. I don't know if they are novices, recycling press releases, or receiving compensation. But Ancestry does not provide users with the basic data and tools to accurately use atDNA for genealogical purposes. Only the hints provided by DNA Circles point toward possible lineages (not people) that might be in your pedigree. A match outside of a DNA Circle is 100% meaningless. Ancestry tries to make much of the fact that a lot people have tested there. Unless you are an adoptee, that doesn't matter.[2] 

4.      If you've already tested at Ancestry, you can compensate somewhat for its deficiencies by uploading your information to gedmatch.com. Unfortunately, the tools on gedmatch (some cost $10/month to access) are not a good substitute for those on FTDNA or dnagedcom.com. So if you're serious about using atDNA, the best solution is to transfer your results to FTDNA. (As of 2/17, FTDNA is accepting transfers from Ancestry again; for full access to data and tools, you must pay the transfer fee.)

5.      Should you test at all?  Only those with detail-oriented, analytical minds who are willing to dedicate many hundreds or thousands of hours are likely to have much success. HOWEVER, even if you don't have the skills, time, or desire to do atDNA properly, some of your experienced matches may break through brick walls for you. I've done this for quite a few matches, but no one can help unless you provide a pedigree that's as complete, accurate, and accessible as possible. So when considering whether to test, ask yourself if you are willing to collaborate and share genealogical information. If you aren't, there's no point in testing.

6.      Who should you test?  Always test the oldest generation available in any genealogical line. If your mother is alive but your father and all grandparents are deceased, then start by testing your mother and work her pedigree first.[3]  Later, you can test yourself and use that to work your paternal pedigree.

7.      atDNA is inherited in an random, haphazard fashion. Each company has an algorithm for estimating how closely you might be related to a match. Unless you are a very close match to someone, those estimates are usually way off.  Pay no attention to them. 

8.      Some people state that atDNA can only be used for genealogical purposes 4-5 generations back. That's wildly incorrect. It's possible to prove some 8th, 10th, even 12th cousins. (Caveat: The more distant the cousin, the farther back in time you're working, so the limitation is finding/generating enough accurate, complete pedigrees.)

9.      You MUST understand that every person inherits 22 pairs of chromosomes comprised of one strand from your mother and the other strand from your father. There are segments on each strand that are defined by numeric markers. For example, my father and his maternal first cousin share many segments, including Chromosome 3 from 72447102 to 98877826 - point A to point B.  (It's a useful convention to lop off the last 6 digits, i.e. 72-99.)  I have mapped the above segment back to Abington Felps b. bef 1707 and Rachel McElroy b. 1711 or someone upstream of them. However, the segment immediately before that on the same strand traces back to an ancestor in a completely different part of my father's maternal pedigree. And the segment after it traces back to yet another unrelated branch in my father's maternal pedigree. In contrast, a segment with similar geographic markers -- but on the opposite strand -- will trace to a set of ancestors in my father's paternal pedigree. 

To reiterate: every strand of every chromosome is a patchwork of segments inherited from ancestors on your maternal or paternal side.  (The inheritance pattern of the 23rd or X chromosome is different.  Explore it only after you have a fair amount of working experience.)

10.   Some segments are long enough to be usable for genealogical purposes. FTDNA thinks a segment should be at least 7.69 cM. This is not a rigid rule, but until you have considerable experience working IBDs, don't bother with smaller segments as many/most of them will be false matches.

Important Sidebar

The next section will explain how to use atDNA properly. But first, I want to explain how not to do it. Beginners start by looking at the surnames of matches. You find someone (Mike Jones) who shares a surname in your pedigree (Hopkins). Then contact may ensue, leading to the determination that you and Mike share a set of common ancestors: Joseph Hopkins of Stafford Co, Va. m. unknown. Next, you check ICWs (in common withs), erroneously thinking that everyone who is ICW with Mike shares atDNA from Joseph Hopkins or his unknown wife. 

The beginners process is riddled with errors. Unfortunately, most people are using atDNA in this fashion and coming to the wrong conclusions. The next section will describe the correct way to use autosomal DNA -- working IBDs.

Before proceeding let me emphasize something: ICWs are a pool of people, many/most of whom do NOT share the same common ancestor. Only the subset that are IBD (see #1 below) might share atDNA from Joseph Hopkins or his wife. The rest are likely to match you in random parts of your pedigree and match Mike Jones in random parts of his pedigree that do NOT overlap with your pedigree. 

Practical application: Working IBDs

THIS IS THE MOST IMPORTANT RULE IN USING atDNA FOR GENEALOGICAL PURPOSES: Anyone who shares more or less the same (significant) segment on the same strand of the same chromosome has the same common ancestor (CA). This rule is called IBD or identity by descent. An IBD may involve two people or dozens of people. Each IBD is a puzzle to be solved. Your understanding of the rules of atDNA may be insufficient. Your ability to follow the rules may fail. Your analysis of an IBD may be faulty. Your genealogical research may be insufficient. However, the IBD itself is never wrong. The correct solution to each and every IBD is a set of CAs somewhere in your pedigree -- and in the pedigree of everyone else that shares the same segment/IBD.

Implementing this rule involves several steps: First, identify an IBD you want to solve. Second, determine all the people on FTDNA who share that IBD. Third, collect as many of their pedigrees as you can and work them as far back as you can. Last, analyze those pedigrees for overlap. (It's best to do the last step as an iterative process.) Some people will do just about anything to avoid developing/analyzing pedigrees of matches. Those people are not working IBDs. You should be spending 75-90% of your time on this task.

IMPORTANT TIPS:

1.      If you aren't already familiar with the tools and features on FTDNA, you must gain familiarity by hands on exploration or by reading the Matches and Chromosome Browser tutorials at the bottom of this page:  https://www.familytreedna.com/learn/family-finder-pages/

Set your filter to "show all matches."

2.      Let's say you match Jack Unsel and you too have the surname "Unsel" in your pedigree, so you want to work the single IBD you share with him. The chromosome browser tool on FTDNA will provide basic information like you match Jack Unsel on Chromosome 8 from 1 to 7.5 for about 13 cM. But how do you figure out who else matches on the same segment and the same strand of Chromosome 2? 

This can certainly be done with the tools on FTDNA. First, click the ICW button for Jack Unsel. You may get 1-5 pages of people. Then put all of those people through the Chromosome Browser tool in batches of 5, looking for people that share pretty much the same segment with Unsel. (This converts a pool of ICWs into the subset that are IBD.) BUT RATHER than using the FTDNA tools, it will save an enormous amount of time -- and produce much more accurate results -- if you upload your kit to dnagedcom.com/ and run a tool called the "Autosomal DNA Segment Analyzer" (not Jworks or Kworks). This tool will give you a visual image of all IBDs -- or those on a single chromosome.[4] Unfortunately, in early 2017, FTDNA handicapped this tool by not allowing dnagedcom.dom to upload ICW info. I don't know if it's temporary or permanent. Please complain directly to FTDNA.

Next, you'll have to learn how to read your ADSA results. Some IBDs are easy to read, but many involve extremely complex analysis, and this is one of the places where many hundreds of hours of direct experience will eventually pay off. The longer you do this, the more accurate you'll become in identifying IBDs and correcting for software inadequacies that make ADSA imperfect but still the best option around.

Note: Some people are unable to wrap their mind around the fact that each chromosome is actually a pair of strands: one strand from the mother and the other from the father.  When you look at an Excel download of matches provided by FTDNA, the maternal and paternal segments are lumped together based on the starting point of the segment. A real IBD never combines segments/matches from maternal and paternal strands. Only ADSA attempts to separate the strands.

3.      DON'T MAKE ASSUMPTIONS. Very often, as a beginner, you'll think you see something obvious and make an assumption. The vast majority of these assumptions will be proven wrong when you properly work an IBD, so it's best not to make any assumptions and let the IBD reveal it's answer. 

Example: Don't assume that simply because you have matches with 5, 10, or 20 people who descend from the same set of CAs on paper, i.e. Cicely Reynolds and William Farrar that you also have them in your pedigree. Many early English and American Colonial couples have tens of thousands (if not hundreds of thousands) of descendants. Reynolds-Farrar may inhabit one of the voids in your pedigree, if and only if you have an IBD in which 3 or more family groups trace back to them. 

Example: If you share 3 significant IBDs with a match, don't assume they all come from the same set of CAs. They may come from 1-3 sets.

Example: If multiple people sharing an IBD have the same surname, don't assume they share the same lineage. Verify the genealogy and look for yDNA confirmation.

Example: Don't assume that any 2 people (including first cousins) have only one set of CAs. The people sharing an IBD usually have multiple CAs. The only way to tell is by looking at complete pedigrees. 

Example: Don't assume that the first set of CAs you find for an IBD are the solution. Most IBDs will reveal multiple sets of shared CAs. Only one of these is the solution.

Example: If A matches B and the maternal first cousin of A does not match B, then don't assume A matches B on her paternal side. First cousins have less than 12.5% shared atDNA of significant length from one set of common grandparents. (My father and his first cousin have less than 11%.)

In short, never push your desires or assumptions into the data. Instead, gather the correct data and approach it curiously. Over time, the mystery will unfold, and you will often be amazed by what it reveals.  

4.      After you know the identity of people who share a segment/IBD/same ancestor, the next step is to collect their pedigrees. Some people on FTDNA post gedcoms but most don't. Usually, the people I contact are responsive, but it does matter how I approach them.[5] Instead of asking a match about a particular surname, try to get her entire pedigree. Many will have trees on Ancestry. Others will have it in a different format. At the very least you want the full names of all 4 grandparents (maiden names of grandmothers). Most of the time, you can use this information to quickly find various trees on Ancestry, that will -- in aggregate -- approximate your match's pedigree. 

Note: Keep those records. Each match on FTDNA has an icon under her name that you can click and add notes. Keep your notes for each match current with date of contact, genealogical information for that person, links to their tree on Ancestry or elsewhere, information on their IBDs, and your progress toward solving them.

5.      Beginners should not try to work IBD's that involve a lot of people. The solution will probably be a CA born in the 1500s or early 1600s and, thus, very difficult to solve. IBD's of only 2 people can't be solved at all because you need at least 3 family groups -- and often more -- to solve an IBD. Look for IBDs with longer segments, rather than shorter segments, as there's a better chance (but not a guarantee) of a more recent CA.

6.      How do you know when an IBD is solved? The truth is that you can never solve one completely.  All the atDNA in your chromosomes is very old; it didn't spontaneously generate in recent generations. Some of it has been so chopped up through combination and recombination, that you'll never know where it came from, but a surprisingly large amount is handed down in roughly intact segments generation after generation after generation. Each of these segments has a pattern or code that allows testing companies to match you with distant and more recent cousins. The hard part is finding overlaps in the relevant pedigrees. Since the interim solution to most IBDs will be someone born in the early 1700s, 1600s, or 1500s, you need fairly complete pedigrees to figure out the CA. Unfortunately, most people don't have very complete pedigrees. If you work with pedigrees that are only 10% complete, then you're likely to have a 90% error rate. So if it's important to solve a particular IBD, you'll have to flesh out the pedigrees of matches yourself. This doesn't mean creating trees from scratch but looking at existing trees and keeping a record of those that -- in aggregate -- show a match's pedigree back to the 1600s, if possible. 

Aside: If you are serious about using atDNA for genealogical purposes, then, yes, you'll need an account on Ancestry, so go ahead and post a tree there. Keep in mind that most of the trees on Ancestry are done by amateurs and are riddled with errors, including a lot of people who are now erroneously claiming to have proven this or that lineage through atDNA (impossible to do using Ancestry alone and few people are sufficiently skilled to do this even if they tested on FTDNA). If you don't have skills in traditional genealogy, you'll have to build those as well. This is a lifetime hobby, and like most hobbies, it takes time to build skills, and there are expenses along the way. 

7.      You'll find a lot of references online about "triangulation." That's the idea that, if you can find 3 people on an IBD with the same set of CAs, then you have solved the IBD. HOWEVER, this is a simplification that very often leads to errors. First, many IBDs include people who are closely related to each other. These people must be identified and treated as a single family group. If you have 3 different family groups that have the same set of CAs, then you have a possible solution to the IBD. Almost all IBDs will involve multiple CAs. For example, you may find 4 family groups that have one set CAs, 3 family groups that share another set of CAs, 2 family groups with yet another proven CA, etc. etc. This is extremely common, so don't stop when you find the first group of 2 or 3. You need to find all CAs. And you'll have to keep working until you have a clear solution to the IBD. Since multiple CAs are commonplace, you may have to wait until you find 4, 5, or more family groups with the same set of CAs. This is especially true of larger IBDs, involving many people. 

Note: The biggest blunder for people who have mastered some of the basics is failure to fully develop the pedigrees of their matches. I'm currently working a small IBD and have already found 3 sets of proven CAs with a couple more possible CAs. 

8.      Important: Sharing a set of proven CAs on paper with Person A doesn't mean you inherited DNA from one of those CAs. Only by working a shared IBD can you be sure which CA bequeathed the relevant segment of DNA.

9.   When you have a solution to an IBD, it will look like this: Nathaniel Tilden b. ca 1583 m. Lydia Hucksteppe b. ca 1587 -- or someone upstream of them. Clearly, the atDNA segment didn't generate spontaneously at the birth of either Tilden or Hucksteppe, and you may later get a match -- on the same IBD -- with someone who has a John Hucksteppe b. 1546 m. Unknown in her pedigree. All solutions are temporary.

10.   A child can carry only 50% of a parent's genome, so generation after generation, the genetic material of some ancestors is lost. Most of  the time, you will have multiple IBDs that point to the same set of CAs. The more IBDs you work, the more clarity you will have about certain, limited parts of your pedigree. (These tend to be the areas where you have a doubling up of the same sets of CAs.) However, other parts of your pedigree were shortchanged by the DNA lottery game. To work them, you need to test additional family members.

11.   If you and your father have both tested, always use your father's kit/results to work paternal matches. If you and your mother have both tested, always use your mother's kit to work maternal matches. 

12.   Keep records that map segments of your chromosome back to the CAs you've proven.

13.   Every significant segment of atDNA will have originated in some remote crook of a remote branch your pedigree. A rough estimate is that there are at least 125+ relevant surnames for any given IBD. Posting a few surnames on your surname list at FTDNA won't help you or anyone else. Develop all branches of your pedigree as far back as possible and list all the surnames in your FTDNA profile. This will save you a lot of time over the long haul. When you flesh out new surnames, add them to your list.

14.   Think of your atDNA information as a legacy, just like a tree posted online.  If you provide a gedcom (or link to a tree) and a complete surname list, your relatives and other researchers can pick up where you left off.  (Do this for all kits you manage.)

Summary 

Take your time and incrementally build skills. Remember: this is a long-term endeavor.  If you can't solve a particular IBD, put it aside and wait for more matches or try to find additional matches on gedmatch.com. If things don't add up, after considerable effort, the most likely explanation is that you've combined matches from different strands, or you have an error in your tree. Sometimes, you'll solve an IBD without knowing exactly where the CA's fit into your pedigree. The more IBDs you work, the clearer that will become.

___________________________________________________

[1] Only males can do the yDNA test and it tells the lineage only of the father's father's father's father, etc.  This is one bookend in your pedigree but only a tiny fraction of your family tree.  Both men and women can do mtDNA. It tells the lineage of your mother's mother's mother etc.  This is the other bookend in your pedigree but only a tiny fraction of your family tree.  Both males and females can test for atDNA, a complex of myriad segments, tracing back to ancestors in ALL branches of your pedigree -- the bookends and everything in between.

[2]  If you are an adoptee or missing a close family member, you probably need to use both FTDNA and Ancestry. 

[3] In most cases, it's not necessary to test multiple siblings. Begin with one test of a parent or grandparent and add kits only as necessary and as your experience increases. A first cousin of the oldest generation tested is a good 2nd step. If possible, try to pick a cousin that has only one set of CAs with your first kit. 

[4] Use the ASDA tool, rather than csv/Excel downloads, showing all of your matches by chromosome, because the csv/Excel download jumbles both chromosome strands together.  Note: Every several months, you'll have a lot of new matches, so upload your kit again to dnagedcom.com again.

[5] Don't overwhelm people with information in your first contact. Personalize emails. Don't send group emails. Offer a phone number or some other indication that you are not a scam artist. If you don't get a response the first time, wait a while and try again. I usually try three times before giving up. (When you first test, a large percentage of your matches will be people who tested years before you did. Some will be deceased. So don't delay in contacting people.) Even without a response or gedcom, I've been able to put on my detective hat and figure out the ancestry a lot of people. You can too.