Foundations of Genetic Genealogy

Near Final Draft

Dec 2018

Contents

  • Introduction
  • DNA and Chromosomes
  • Chromosome Inheritance and Assortment
  • Chromosome Crossover and Segmentation
  • DNA Markers and Segment Identification
  • Analysing DNA Test Data
  • An Extremely Compressed Review
  • Sources and More Information

Introduction

This page is for people who want to develop a deeper understanding of how and most especially why genetic genealogy works. If you haven't already done so, I suggest you first take a look at what's on our Getting Started page. That's also where you'll find links to other introductions like this one. Some of those introductions go into greater depth, but the information here is enough for you to effectively understand your DNA test results.

Not everyone will want or need what's presented here, and it's entirely possible to skip this page and move on to the Understand Your Test Results page.

However, some may wish for a deeper understanding, others will be interested for mere intellectual stimulation, and some of us don't easily absorb information unless we have some understanding of its context. It's all good! Use what you find helpful and move on if you lose interest. You can always circle back later to deepen your understanding when it becomes more applicable to your current needs.

Many of the statements below are simplifications and generalisations, sometimes extreme, which are most helpful for initially learning a complex area like genetic genealogy. Words like "typical" are sometimes used to highlight that variations and exceptions exist.

DNA and Chromosomes

DNA is the chemical that holds genetic information in nearly all modern Earth life. As a simplification, DNA can be represented as a series of letters A, C, G and T (technically called nucleotide bases). The order of letters encodes the information needed to build the components of living organisms, for example as genes that encode proteins. DNA can be structured as a circular or linear strand. In the context of genetics, we call a single strand of DNA a chromosome.

Complex organisms have several chromosomes. Human cells each typically carry 46 linear chromosomes and multiple copies of 1 tiny, circular chromosome. The circular chromosome is called mitochondrial DNA (mtDNA), is of secondary importance for genetic genealogy as it's currently practiced in 2018, and will be omitted from further discussion here. The total set of all human DNA is called the human genome.

The 46 human linear chromosomes include 22 matching pairs, commonly called the autosomes or autosomal DNA (auDNA or atDNA). There is an additional pair of chromosomes that determine sex and is therefore commonly called the pair of sex chromosomes: the X-chromosome (chrX) and the Y-chromosome (chrY). Biological females carry two X-chromosomes and biological males carry one chrX and one chrY.

Class notes for this section, DNA and Chromosomes, with more detail.

Figure 1. Simplified Representations of a DNA Segment. A DNA segment represented in several ways: as a molecule, a cartoon diagram, and a double or single chain of letters.

Simplified representations of a DNA segment, as a molecule, a cartoon diagram, or a series of letters.

Figure 2. DNA in a Chromosome in a Cell. An illustration of how DNA is packed into a chromosome, which is itself inside the nucleus of a cell.

Diagram of DNA packed into a chromosome, which is itself in a cell nucleus.

Figure 3. Photographs of Human Chromosome Pairs. A full set of forty-six human chromosomes found in nearly every human cell, as photographed in a microscope, arranged in twenty-three matched pairs.

Photographs of human chromosome pairs.

Chromosome Inheritance and Assortment

In human genetics, the fundamental function of sex is to create genetic diversity by recombining two individuals' chromosome sets to form a new and different set in their offspring.

There are two broad steps to the process. First is the creation of sperm or eggs which each hold only 23 of the 46 parental chromosomes. During this process, called gametogenesis, one chromosome is randomly selected from each of the 23 originating pairs. (Meiosis, a term you may encounter, is part of gametogenesis.) Second, a sperm and an egg fuse. The result of this process, called fertilization, is a one-cell embryo with a complete set of 46 chromosomes, which develops into a child. In this way, you inherit half of your chromosomes—half your DNA—from your father and half from your mother.

One chromosome in each of your pairs is maternal in origin and the other paternal. Note that siblings will receive different sets of chromosomes, because although each parent has 46 chromosomes, each child only receives 23 of those—one randomly selected from each pair in each parent. This is is called chromosome assortment, and is one of the two major randomization processes that make offspring genetically different from their parents and each other.

The consequence of chromosome assortment is that children are genetically different from both their parents and their siblings. They’re different from their parents because they’ve received only half of each parents’ chromosomes (and genes, and DNA). They’re different from their siblings because each child has received a different combination of chromosomes, one selected from each of the 23 pairs in each of their parents.

Chromosome assortment is also the source of the 50:50 sex ratio of females and males. Because females have a pair of sex chromosomes consisting of two X-chromosomes, in all eggs the only sex chromosome present is an X. But because males have a pair of sex chromosomes consisting of one X-chromosome and one Y-chromosome, typical gametogenesis leads to half of their sperm carrying an X and half carrying a Y. When a sperm fertilizes an egg, the resulting one-cell embryo will then either be XX and develop into a biological female, or XY and develop into a biological male.

Class notes for this section, Chromosome Inheritance and Assortment, with more detail.

Figure 4. Cycle of Fertilization and Gametogenesis. Human sexual reproduction is a cycle moving back and forth with each generation. During gametogenesis, gametes (eggs & sperm) are formed that have only one chromosome from each of our 23 pairs. Then during fertilization two gametes fuse to recreate a complete set of 23 pairs of chromosomes, 46 total. Only three pairs are shown in the simplified diagram here.

Figure 5. Chromosome Assortment into Siblings. Sperm and eggs hold only one, randomly selected chromosome from each of the 23 pairs found in humans. So different children even from the same parents will each get a different set of chromosomes than their siblings. This randomization process is called chromosome assortment, and generates genetic diversity.

Chromosome Crossover and Segmentation

The second major randomisation process that makes offspring genetically different from their parents happens during the process of creating sperm and eggs. It's called chromosomal crossover, or crossing over.

Before the 23 pairs of chromosomes are separated into two sets, each for one egg or sperm, they align and may exchange segments. This happens through breaks made in the equivalent location on each chromosome strand of the pair, which are repaired by being reattached to the other chromosome in the pair.

Depending on the length of the chromosome and on random processes, zero to four breaks and rejoinings (crossover events) may happen for any given chromosome pair. The outcome is that if a crossover event occurs, the resulting chromosome in the egg or sperm will include some segments of DNA from one grandparent of the eventual offspring, and some from the other grandparent. In genetic genealogy, this process is called segmentation.

If there are zero crossovers between a pair of chromosomes, then those chromosomes get passed along intact, and is identical in the parent and child. If there are crossovers, the chromosome inherited by the child will be a combination of segments from the parent's maternal and paternal chromosomes.

In genetic genealogy, DNA segment length is measured in centimorgans (cM), and 1 cM corresponds to roughly 1,200,000 DNA letters. To provide some perspective, human chromosome 1 (chr1), the longest, is about 280 cM long. Chr22 is around 68 cM long.

Because at most only a few crossover events happen for each chromosome pair in each generation, a child's grandparent-sourced segments are typically large. On average, the longest reported segment shared between a grandparent and grandchild is around 170 cM, and segments around 60 cM long are common. However, with each generation inherited segments may be further split by crossovers, so the segments that trace to great-grandparents are shorter on average, and so on with increasing relationship distance. For example, the average longest shared segment reported between third cousins is about 30 cM.

Since each child receives only half of each parent's DNA, ancestral segments do not always get transmitted to later generations. Roughly, but not exactly, 25% of your DNA comes from your maternal grandmother, for example, and the other 75% of her DNA is lost to your descendant lineage. But your siblings and first cousins have a different 25% of her DNA, so segments you didn't inherit might be included in their DNA.

Figure 6. Crossover Then Chromosome Assortment. A simplified diagram of gametogenesis, illustrating a crossover event between two chromosomes in a matched pair, followed by separation (assortment) of the two segmented chromosomes into individual sperm. A similar process produces eggs. For actual human eggs and sperm, each one carries 23 chromosomes—one selected randomly from each of the adult's 23 pairs. Only one pair is illustrated here.

Diagram showing a pair of chromosomes undergoing a crossover event, then being segregated into two separate sperm cells.

Figure 7. Segmentation Over Generations. A simplified diagram following one chromosome pair (for example, chromosome 2) across three generations. In the grandparent's generation (the top row), each maternal and paternal chromosome is shown with a different color. A wide horizontal "X" between a chromosome pair marks the site of a crossover. Crossovers generate a segmented chromosome in the next generation. Note that the child in this illustration (bottom row) has not received 25% of each grandparents' DNA for this single illustrated pair of chromosomes. But across all 22 autosomal DNA pairs the amounts will combine to around 25% total from each grandparent.

Figure 8. An IDB Segment. A more extensive example of DNA inheritance by segment, again showing just one chromosome pair per individual for illustration. In this case, the two bottom individuals are first cousins. They have a shared DNA segment, in orange, that they have both inherited from their parents, and which originated with one of their common ancestors, a grandfather. "IBD" is an acronym for Identical By Descent, meaning the segment is identical in the two people because it descends from a common ancestor.

DNA Markers and Segment Identification

Single-letter mutations change one letter in a DNA sequence to another, and appear randomly and rarely over generations. Mutations can serve as DNA markers for identifying a DNA segment, and single-letter mutations (called SNPs, pronounced "snips") are the most common type of mutation used as markers in current genetic genealogy. The most frequent letter in the human population found at a particular location on a chromosome is presumed to be the original one, and is called the wild-type or plus version (wild-type allele). The less common version is called the mutant, minus, or variant allele.

For autosomal DNA testing, markers were selected by genetics scientists for their simplicity (a good marker site has only two common variants), informativeness (a significant proportion of the total human population has the variant), to evenly cover the entire human genome, and for their reliability under current testing technology.

A typical genealogical DNA test examines around 700,000 DNA markers, separated by about 0.004 cM on average. Because DNA is inherited in long segments, markers that are adjacent tend to stay together for many, many generations. DNA matching software doesn't look for single marker matches, but rather for long runs of contiguous matching markers, typically extending at least 2-3 cM and encompassing 500-700 markers.

Contiguous runs of DNA markers identical between two people identify shared DNA segments, presumed to be inherited from a common ancestor. Because the human species has only been around two or three hundred thousand years, and we are less diverse genetically than most species, humans overall have a lot of shared small DNA segments. Shared segments of 5 cM and smaller are common between unrelated people.

Figure 9. DNA Markers and Variants. Here is an illustration of adjacent DNA markers on a chromosome, represented four ways with increasing detail. Genealogical DNA markers are points on a chromosome selected as known sites of a harmless mutation, a variant, in some fraction of the total human population. Runs of contiguous markers form a DNA segment, and the pattern of variants along it is the hallmark of a particular DNA segment found in the human population.

DNA markers and variants illustrated on a chromosome.

Analysing DNA Test Data

Because of how DNA is inherited as described above, we now have a basis for understanding some key elements of DNA test results analysis. For example: closer relatives will have more total DNA in common, larger segments in common, and a larger number of segments in common. DNA matches—people who are likely relatives—are identified and initially assessed based on this information.

For Ashkenazi genetic genealogy, where a significant amount of DNA is in common across the population, knowing the size of the largest segment in common is extremely helpful for distinguishing closer from farther relatives. Among the DNA matches that an assessment of total DNA in common would assign as 3rd or 4th cousins, the ones with at least one large shared segment (over 20 cM) are far more likely to be actual 3rd or 4th cousins.

Why is looking for large segments especially helpful when analysing Ashkenazi Jewish genetic genealogy? Because of endogamy, which is the cultural practice of marrying only within a specific population. Endogamy leads to that population's members carrying more shared DNA segments with each other than they do with members of outside populations.

Consequently, as a member of an endogamous population you might appear to have a lot of DNA in common with a person, not because you're a close relative but because you are distantly related to them on both sides of your family. However, because DNA segments from distant ancestors are smaller than those from close ones, you will probably not have any large DNA segments in common with that person even though your total amount of shared DNA is high.

If the size of the largest segment in common is not available (the case at AncestryDNA), consideration of the number of shared segments can also help assess whether DNA matches are likely to be worth a closer examination. However, the number of segments in common, even among known 2nd, 3rd or 4th cousins, can be highly variable. Also, endogamy can lead to many shared (but small) DNA segments even among rather distant relatives.

For more details, other useful tips, and step-by-step instructions for analysing and understanding your autosomal DNA test results, please visit our page Understand Your Test Results.

An Extremely Compressed Review

DNA holds genetic information. That information is represented as a string of letters (A, C, G or T), and comes in strands called chromosomes. Modern genetic genealogy research focuses on autosomal DNA, which are all the chromosomes other than the pair that controls sexual development.

Of your 46 chromosomes, 23 came from your mother and 23 from your father. The process that selects which 23 of each parents' 46 chromosomes go into a particular sperm or egg also "segments" each chromosome. For example, the chromosomes you inherited from your mother include segments from both your maternal grandmother and maternal grandfather. Each generation of a descendent lineage inherits fewer and shorter segments from any particular ancestor of that lineage. Our chromosomes are a patchwork of DNA segments, each of which derives from an originating ancestor.

Genealogical DNA test analysis looks for patterns of DNA markers, which identify identical DNA segments shared by two people. If the amount of shared DNA is large enough, the two people are considered to be DNA matches. The presence of a shared segment implies two people both inherited that segment from a common ancestor. Closer relatives have on average more total DNA in common, their matching segments are longer, and there are more of them.

Sources and More Information

DNA and Chromosomes

...coming soon...

Chromosome Inheritance and Assortment

...coming soon...

Chromosome Crossover and Segmentation

...coming soon...

DNA Markers and Segment Identification

  • ...coming soon...
  • The set of 44 contiguous chr12 markers in figure 7 is from a 2016 raw DNA data report downloaded from AncestryDNA. Chromosome locations for DNA markers were very roughly estimated from the Genome Data Viewer, at the National Center for Biotechnology Information (NCBI). A view showing the location of the SNP marker named rs1868879 on chr12 is here. Marker DNA sequences are from the dbSNP database at the NCBI Information. The entry for marker rs1868879 is here.

Analysing DNA Test Data

See our page Understand Your Test Results for more details about this section.

Figure Sources and Credits

1. DNA molecule illustration is from Wikimedia. DNA cartoon diagram is modified from a Wikimedia source. 2. DNA/chromosome drawing is modified from a Wikimedia source. 3. Human chromosome photographs are modified from a Wikimedia source. 4. Original illustration, with some elements (sperm) from a Wikimedia source. 5. Original illustration, with some elements (sperm) from a Wikimedia source. 6. Diagram elements are modified from a Wikimedia source. 7. Original illustration. 8. Illustration from Wikimedia. 9. Chromosome 12 diagram is modified from a Wikimedia source.

DNA Special Interest Group