Methodology and Limitations

Methodology

Y-DNA Methodology

I used the following methodology to prepare the Y-DNA trees and supporting information posted on this website in 2018. (As discussed on this page, in 2022 I refined the analysis by using Family Tree DNA's Y-DNA Discover utility to analyze further each of the Y-DNA clusters that I identified through my 2018 analysis .)

1. I used the methodology described below to identify the FTDNA-reported Y-DNA SNPs for the cohorts of Family Finder-tested men who share 100 cM, 80 cM, or 50 cM of autosomal DNA, respectively, with at least one of 25 people with four Ashkenazi grandparents.

2. Y-DNA SNPs reported by FTDNA based upon STR testing date back tens of thousands of years and are therefore of limited use in identifying the Y-DNA lines that were present in an Ashkenazi (or proto-Ashkenazi) population that dates back 1,500 to 2,000 years, or in a Jewish (or proto-Jewish) population that dates back perhaps 3,000 to 3,500 years. Accordingly, I disregarded these far-upstream SNPs when making my initial efforts to identify Ashkenazi Y-DNA lines.

FTDNA uses one to four upstream SNPs to identify the haplogroups of men who have done only STR testing. (The men whose haplogroups are identified in red text on FTDNA's project pages have done only STR testing.)

The haplogroups reported by FTDNA based on STR testing in the Y-DNA haplogroups that contain a significant number of Ashkenazi Jews are as follows:

E: E-L117, E-M2, E-M35, and E-M96

G: G-M201

I: I-M170, I-M223, I-M253, and I-P37

J1: J-M267

J2: J-M172

Q: Q-M242

R1a: R-M512, R-M198, R-Z283

R1b: R-M269

R2: R-M124

T: T-M70

3. I then used FTDNA's Y-DNA Haplotree to determine, for each Y-DNA terminal SNP identified as potentially Ashkenazi through the methodology used for this analysis, the total number of men in FTDNA's database who were, as of early January 2019, reported as "Branch Participants" for that SNP (i.e., the number of men who were reported as having that SNP as their terminal SNP).

If dozens or hundreds of men are reported as Branch Participants for a SNP, that indicates that the SNP has been reported based on STR test results, Geno 2.0 results, SNP packs, or Walk Through the Y testing, rather than based on Big Y testing. Except in rare circumstances, SNPs shared by a large number of men are likely to date back several thousands of years to tens of thousands of years, making those SNPs generally irrelevant to the analysis performed herein.

4. I next compared the number of men reported as Branch Participants for each SNP with the number of men in the sample set reported as having that SNP. For the reasons discussed below, as a general matter the men in the sample set will constitute only a proportion -- perhaps 25% to 50% -- of the Ashkenazi men who have done Big Y testing.

Thus, if about 25% to 50% of the total number of men reported by FTDNA as having a particular terminal SNP are included in the sample set (especially at 100 cM), there is a substantial likelihood that men reported with that SNP are of Ashkenazi ancestry on their direct male lines. Reference to FTDNA's Y-DNA Haplotree demonstrated that the vast majority of such men belonged to Y-DNA clusters that include multiple branches with other men in the sample set, confirming that those SNPs are commonly found in the Ashkenazi population.

Conversely, there are a substantial number of men in the sample set (especially in haplogroups I and R1b) who were only 50 cM matches for at least one of the 25 probands and who are not in the same Y-DNA clusters as other men in the sample set. There is a significant likelihood that many -- but by no means all -- of those men are not Ashkenazi (or Jewish) on their direct male lines.

Because the sample set at 50 cM is demonstrably overinclusive in this regard, I disregarded 50 cM matches if they did not belong to Ashkenazi Y-DNA clusters or were only a small percentage of the Branch Participants reported by FTDNA. As a consequence, however, some rare Ashkenazi Y-DNA lines have likely been excluded from the results reported herein.

5. Using the FTDNA Y-DNA Haplotree as of early January 2019 to identify branching, I prepared trees (linked, by haplogroup, from the left-hand column on this page) for each separate ancestral line identified through this analysis.

In most instances, the identified ancestral lines include more than one branch with a substantial number of Ashkenazi men. However, some such ancestral lines appear to include non-Ashkenazi -- and non-Jewish -- branches, in addition to Ashkenazi branches. In a few instances, the identified ancestral lines include only a single branch with Ashkenazi men in the sample set.

6. To the right of each posted tree on the pages for each Y-DNA haplogroup, I have included the data that I used in identifying likely Ashkenazi branches -- tables showing, for each SNP: (a) the SNP name; (b) the number of Branch Participants reported by FTDNA; and (c) the number of men in the sample set who match at least one of the 25 probands at (i) 100 cM, (ii) 80 cM, and (iii) 50 cM.

7. Below each posted tree, I have set forth the SNPs that are upstream from the top SNP of the tree, as taken from the FTDNA Y-DNA Haplotree. I have included the ISOGG designation for each cluster.

For certain Y-DNA clusters that are included in YFull's Y-Tree, I have set forth: (1) YFull's estimate of a time to a most recent common ancestor ("TMRCA") in years before present ("ybp"); and (2) YFull's range of time to a most recent common ancestor ("TMRCA") at 95% accuracy.

For certain Y-DNA clusters that are included in Wim Penninx' analyses at JewishDNA.net, I have included: (1) his designation for the cluster (a branch number followed by a number (e.g., AB-067), and an abbreviated SNP chain (e.g., R1a-Z93-M582)); and (2) his estimated TMRCA range stated in terms of years at 95% accuracy.

8. At the bottom of the page for each Y-DNA haplogroup, I have included a table identifying every Y-DNA SNP in the sample set that does not appear on one of the posted trees.

For each SNP that is on the Big Y tree but not on one of the posted trees, I have stated whether such SNP is upstream from SNPs found in one or more Ashkenazi Y-DNA clusters. If men who are reported based upon STR or Geno 2.0 testing as having these SNPs are of Ashkenazi descent on their direct male line, there is a high probability that such men belong to one of those Ashkenazi Y-DNA clusters. However, a large proportion of the men reported as having these terminal SNPs are not of Ashkenazi descent on their direct male lines.

I have also identified SNPs that are not on one of the posted trees but that may reflect an Ashkenazi Y-DNA line. (Such SNPs are those that are found disproportionately in the sample set, thereby indicating likely Ashkenazi origins, but are not part of an Ashkenazi Y-DNA cluster identified in the study.) Further testing is necessary to confirm whether these SNPs define Ashkenazi branches.

Finally, those SNPs that I have not identified as being upstream from an Ashkenazi Y-DNA cluster or as possibly reflecting an Ashkenazi Y-DNA line are likely not to be Ashkenazi in origin. In many instances, those Y-DNA lines appear in the sample set because of overinclusiveness in the methodology used to identify likely Ashkenazi Jews. In other instances, those Y-DNA lines may be Sephardic or Mizrahi in origin, or may have entered the Ashkenazi population in the past few centuries. It is also likely that further test results will show that a few of these Y-DNA lines are minor and/or undertested Ashkenazi lines.

mtDNA Methodology

1. I used the methodology described below to identify the FTDNA-reported mtDNA SNPs for the cohorts of Family Finder-tested people who share 100 cM, 80 cM, or 50 cM of autosomal DNA, respectively, with at least one of 25 people with four Ashkenazi grandparents.

2. The mtDNA frequencies reported by haplogroup and clade are based upon the cohort of Family Finder-tested people who share 100 cM with at least one of 25 people with four Ashkenazi grandparents.

Issues with Using Family Finder Match Lists to Determine Ashkenazi Haplogroup Frequencies

While the issues identified above create limitations on the ability to use Family Finder match lists to determine Ashkenazi Y-DNA and mtDNA haplogroup frequencies, those issues can be mitigated in large part through use of the methodology described below.

1. To address the fact that a single Ashkenazi person's Family Finder match list will fail to identify many fully Ashkenazi people (i.e., people with four Ashkenazi grandparents) who have done Y-DNA or mtDNA testing, I aggregated Family Finder lists from multiple people who are fully Ashkenazi.

For use in this analysis, I combined Family Finder match lists from 25 people with four Ashkenazi grandparents (the probands) to create a spreadsheet that includes all of their matches (for a total of 56,162 discrete persons) as of December 20, 2018. Because of the endogamous nature of the Ashkenazi population, this combined match list is likely to include almost all of the people with four Ashkenazi grandparents who have done Family Finder testing (along with many other people who do not have four Ashkenazi grandparents).

2. To address the fact that many of the people on the combined match list are likely not Ashkenazi on their direct male and/or direct female lines, I, taking advantage of the fact that endogamy results in relatively high levels of shared autosomal DNA in the Ashkenazi population, considered only those matches who share a high amount of autosomal DNA with one of the 25 probands, people who have four Ashkenazi grandparents.

For use in this analysis, I excluded from the Y-DNA and mtDNA frequency analysis any person on the combined match list who does not share a substantial amount of autosomal DNA with at least one of the 25 probands. For analytical purposes, I initially considered three cohorts of Y-DNA-tested men and mtDNA-tested people who shared at least 50 cM, 80 cM, or 100 cM, respectively, with at least one of the 25 probands:

50 cM 80 cM 100 cM

Total Tested: 37,381 26,938 21,849

Y-DNA Tested: 8,196 6,231 5,229

mtDNA Tested: 8,345 6,276 5,187

As discussed below, there is some variation in the frequencies of Y-DNA and mtDNA haplogroups depending upon whether such frequencies are considered at 50 cM, 80 cM, or 100 cM. Because people who share the most autosomal DNA with fully Ashkenazi matches are the most likely to be Ashkenazi on all of their lines (including their direct male or female lines), the percentages reported on this website are those among the 100-cM matches.

The correspondence between 100-cM matches and Ashkenazi direct male or female lines is by no means perfect -- it is highly likely that (1) some people with four Ashkenazi grandparents will have some close relatives who are not Jewish (or are Jewish but not Ashkenazi) on their direct male or direct female lines, and (2) some people with high proportions of Ashkenazi autosomal DNA will be descended on their direct male and/or female lines from ancestors who were not Jewish (or who were Jewish but not Ashkenazi) -- so it is highly likely that the percentages reported herein are somewhat skewed. (To deal with the former issue, one could use a cut-off of autosomal DNA to remove matches who are very closely related to a proband. After completing this analysis, I removed any matches of greater than 200 cM; because virtually all such persons who have done Y-DNA and/or mtDNA testing shared at least 100 cM with another proband, using the 200-cM cutoff had no appreciable effect on the haplogroup percentages reported herein.)

As discussed here, however, the frequencies calculated through this analysis are in line with those set forth in prior analyses, which tends to support the reliability of this methodology in broad strokes, notwithstanding the strong likelihood that the 25 probands have a few 100-cM matches who are not Jewish (or who are Jewish but not Ashkenazi) on their direct male or direct female lines.

3. A practical issue -- not a methodological one -- is presented by FTDNA's practice of reporting haplogroups with different degrees of generality or specificity depending upon the level of testing performed.

To address this issue, for purposes of this analysis I used the highest level of generality for most of the Y-DNA haplogroups, but broke Y-DNA haplogroup J into J1 and J2 and Y-DNA haplogroup R into R1a, R1b, and R2 to the extent that the information reported on the match lists was sufficient to allow identification of those clades. (In a few instances, I omitted test results within Y-DNA haplogroups J and R because available information did not readily allow a determination of whether the tested men belonged to J1 or J2, or R1a, R1b, or R2, respectively.)

4. This analysis does nothing to address the facts that: (1) a substantial number of the people who have done Y-DNA or mtDNA testing through FTDNA have not done Family Finder testing; and (2) most of the Ashkenazi population has not done Y-DNA or mtDNA testing through FTDNA at all.

Because of the large sample size and the lack of any reason to doubt that the people who have done both Family Finder testing and Y-DNA and/or mtDNA testing are representative of the universe of Ashkenazi Jews who have done Y-DNA or mtDNA testing through FTDNA (or of the Ashkenazi population in general), I have assumed that the frequencies reported herein are generally representative of those of all Ashkenazi Jews who have done Y-DNA or mtDNA testing (and of the Ashkenazi population in general).

Absent an easy way to identify those people who (1) may have tested the Y-DNA or mtDNA of relatives who have matching Y-DNA or mtDNA or (2) may have chosen not to test Y-DNA or mtDNA because a close relative with the same Y-DNA or mtDNA has already tested, I assumed that these two possibilities generally cancel each other out.

Similarly, because there is no way to use the combined match list to determine the Y-DNA and mtDNA haplogroups of Ashkenazi Jews who have not done Y-DNA or mtDNA testing, I assumed that the very large sample size considered in this analysis makes it likely that the samples considered herein are generally representative of the Ashkenazi population as a whole.

To the extent that, as is likely, any of these three assumptions is not fully accurate, the Y-DNA and mtDNA haplogroup frequencies reported herein will differ somewhat from the frequencies found in the Ashkenazi population as a whole.

Issues with Using Family Finder Match Lists to Determine Ashkenazi Haplogroup Frequencies

FTDNA provides its customers with lists identifying matches for each type of testing that they have performed -- Y-DNA, mtDNA, and/or autosomal (Family Finder) testing.

Y-DNA and mtDNA match lists will show only a tested person's closest matches within the same Y-DNA or mtDNA cluster. Accordingly, a person's FTDNA's Y-DNA and mtDNA match lists do not provide information allowing analyses of Y-DNA or mtDNA clusters other than those to which the tested person belongs.

However, FTDNA's Family Finder match lists provide information concerning, inter alia, the Y-DNA and mtDNA haplogroups for all of the tested person's autosomal matches who have performed Y-DNA or mtDNA testing, regardless of haplogroup. Family Finder match lists for people who are fully Ashkenazi include far more people than do their Y-DNA and mtDNA match lists. Accordingly, Family Finder match lists are a resource that can be used for compiling information concerning the frequency of Y-DNA and mtDNA haplogroups in the Ashkenazi population.

There are, however, several issues with using Family Finder match lists to analyze the frequency of Ashkenazi Y-DNA and mtDNA haplogroups.

1. Family Finder match lists are underinclusive, in several respects. As of December 2018, people with four Ashkenazi grandparents had in the neighborhood of 20,000 Family Finder matches, but the number of people of Ashkenazi ancestry who have done Family Finder testing is considerably greater. Beyond that, there are many people of Ashkenazi ancestry on their direct male and/or female lines who have done Y-DNA testing and/or mtDNA testing through FTDNA but have not done Family Finder testing.

2. The Family Finder match lists are overinclusive. The match lists of even those people with four Ashkenazi grandparents will invariably include many people whose Y-DNA and/or mtDNA lines are not Ashkenazi. In some instances, those lines may be of non-Jewish origins; in other instances, those lines may be Jewish but non-Ashkenazi (e.g., Sephardic or Mizrahi).

3. The Y-DNA and mtDNA haplogroups for Ashkenazi Jews in the Family Finder database may not be fully representative of those in the Ashkenazi population as a whole.

The vast majority of people of Ashkenazi ancestry have not done Y-DNA and/or mtDNA testing; the absence of Y-DNA and mtDNA results from such people may skew haplogroup frequencies.

There is also an issue of representativeness even as to those persons who have done Y-DNA and/or mtDNA testing through FTDNA. On the one hand, people who do DNA testing on themselves are more likely to order DNA testing for family members, who will in many instances belong to the same Y-DNA or mtDNA haplogroups as other tested relatives. On the other hand, people who do DNA testing will generally be aware of which of their relatives share their Y-DNA or mtDNA and therefore will not order Y-DNA or mtDNA testing for those relatives (absent a desire to confirm a relationship or to identify recent mutations in their Y-DNA or mtDNA lines).

4. FTDNA reports Y-DNA and mtDNA results differently depending on the type/level of testing performed.

With regard to Y-DNA, haplogroups will be reported at: (1) a very high level of generality for men who have done STR testing (i.e., 12-, 25-, 37-, 67-, and 111-marker testing); (2) at a high or intermediate level of generality for men who have done Geno 2.0 testing or a la carte SNP testing; and (3) at a very high degree of specificity for men who have done Big Y testing (full Y-DNA sequencing). As a result, men within the same Y-DNA cluster might be reported on the FTDNA match lists as, for example, (1) R-M512 or R-M198 (if the man has done only Y-DNA STR testing), (2) R-F1345 (if the man has done Geno 2.0 testing), (3) R-CTS6 (if the man has tested CTS6 on an a la carte basis), (4) R-Y2630 (if Y2630 is identified as the man's terminal SNP through Big Y testing), or (5) any one of the 30 terminal SNPs below the R-Y2630 level that had been identified as of December 2018. These discrepancies cause issues with regard to those Y-DNA haplogroups that are generally divided into clades (i.e., haplogroup J includes J1 and J2, and haplogroup R includes R1a, R1b, and R2), and STR-based haplogroups often cannot readily be used to place men into clusters below the haplogroup level.

With regard to mtDNA, haplogroups will be reported with more specificity depending on whether the tested man or woman has done HVR1 testing, HVR1 & HVR2 testing, or full maternal sequence ("FMS") testing. As a result, tested persons within the same mtDNA cluster might be reported on the FTDNA match lists as, for example, (1) H (if he or she has done only HVR1 testing), (2) H5 (if he or she has done only HVR1 & HVR2 testing), or (3) H5-T16311! (if he or she has done FMS testing).

Choice of a 100-cM Matching Threshold for Identifying Subjects for Inclusion

For the reasons discussed in more detail below, the analysis set forth herein is based upon people who share a total of at least 100 cM with at least one of the 25 probands.

Comparison of Y-DNA Results at 50 cM, 80 cM, and 100 cM

The chart below shows some variances in the frequencies of Y-DNA haplogroups depending on whether the dataset consists of people who share a total of at least 50 cM, at least 80 cM, or at least 100 cM with at least one of the 25 probands:

This chart shows that there is, for the most part, considerable consistency between the frequencies of each Y-DNA haplogroup at shared amounts of 50 cM, 80 cM, and 100 cM. This suggests that a shared amount of 50 cM with a person with four known Ashkenazi ancestors will often (but by no means always) reflect Ashkenazi ancestry on the direct male line.

There are, however, three Y-DNA haplogroups in which there are significant downward discrepancies in the frequencies of haplogroups between the cohorts at 50 cM, 80 cM, and 100 cM. Most significantly in numerical terms, the percentage of men in haplogroup R1b in the dataset decreases from 15.68% at 50 cM to 12.62% at 80 cM to 11.50% at 100 cM. There is a more substantial proportionate decrease in men in haplogroups I (4.75% to 3.13% to 2.67%) and N (0.30% to 0.13% to 0.12%) from 50 cM to 80 cM to 100 cM.

The fact that there are a significant number of R1b and I men (and a handful of N men) whose proportionate Ashkenazi admixture is less than that of many men in the dataset suggests that the dataset contains a significant number of men in those haplogroups who do not have four Ashkenazi grandparents, which decreases -- but is by no means dispositive of -- the likelihood that such men have Ashkenazi ancestry on their direct male lines. The existence of this discrepancy suggests that it is preferable to use the 100 cM threshold for calculating Y-DNA haplogroup frequencies, even though the dataset for the 100 cM cohort is considerably smaller than the datasets for the 50 cM and 80 cM cohorts.

Comparison of mtDNA Results at 50 cM, 80 cM, and 100 cM

The chart below shows some variances in the frequencies of mtDNA haplogroups depending on whether the dataset consists of people who share a total of at least 50 cM, at least 80 cM, or at least 100 cM with at least one of the 25 probands:

This chart shows that there is, for the most part, considerable consistency between the frequencies of each mtDNA haplogroup at shared amounts of 50 cM, 80 cM, and 100 cM. This suggests that a shared amount of 50 cM with a person with four known Ashkenazi ancestors will often (but by no means always) reflect Ashkenazi ancestry on the direct female line.

There are, however, five mtDNA haplogroups in which there are significant downward discrepancies in the frequencies of haplogroups for the cohorts at 50 cM, 80 cM, and 100 cM. Most significantly in numerical terms, the percentage of people in haplogroup H in the dataset decreases from 29.38% at 50 cM to 26.83% at 80 cM to 25.91% at 100 cM. There are also substantial proportional decreases in people in haplogroups A (0.29% to 0.13% to 0.06%), I (1.91% to 1.59% to 1.41%), J (7.43% to 7.28% to 7.09%), T (5.44% to 4.78% to 4.34%), and U (7.12% to 5.86% to 5.09%).

The fact that there are a significant number of people in mtDNA haplogroup H (and a handful of people in other haplogroups) whose proportionate Ashkenazi admixture is less than that of many people in the dataset suggests that the dataset contains a significant number of people in those haplogroups who do not have four Ashkenazi grandparents, which decreases the likelihood that they have Ashkenazi ancestry on their direct female line. (However, a large percentage of the people in haplogroups H, J, T, and U are likely Ashkenazi on their direct female lines.) Once again, the existence of this discrepancy suggests that it is preferable to use the 100 cM threshold for calculating proportions, even though the dataset for the 100 cM cohort is considerably smaller than the datasets for the 50 cM and 80 cM cohorts.