Unique STR 565 Value
How do You Relate?
Jan Brouwer descendants carry a unique DYS565 Allele Value
Richard D. Brewer
September 8, 2010
Abstract: A very rare DYS565 Allele value of 7 repeats has been found in the Jan Brouwer descendants group. This value defines a unique family branch within the major haplogroup I2b1c; a mutation that likely occurred in an ancestor living in the Belgium / Netherlands / Germany region 400 to 700 years before the present. If two people share the allele value on this marker and are similar enough across the rest of the markers to share a common ancestor in a genealogical time frame, then they belong to the same family branch and everyone without the rare allele does not.
Background: Extending the Y-DNA 37 marker test to 67 markers for William L. Brewer (kit 14813), a member of our Brewer surname project, revealed an anomalous haplotype signature with the very unusual variant allele value of 7 for DYS565, in stark contrast to the published consensus range of 9 to 14. During our investigation of this unusual occurrence, Ken Nordtvedt reported to us that he found only one other haplotype with a 7 at DYS 565 in his substantial database of haplogroup “I” signatures, that being for a Hermans surname with a SNP tested P78+ haplogroup I2b1c. With such an allele value being apparently very rare, it was judged possible that it could define a unique family marker value. We needed to find supporting statistics.
I located Arnold Hermans in the Y-Search database, user ID CXPGN, and hisY-Search data profile indicates that his most distant ancestor is a Petrus Hermans (b. ca 1695 and d. 1762) in the Netherlands. This is of interest in as much as William L. Brewer is also likely descended from our 17th century immigrant from the Netherlands, Jan Brouwer. The Hermans signature, besides matching William on the unusual value of 7 at DYS565, also showed significant matches on other slow mutating markers in the 38-67 panel.
Because of this we decided to extend the 37 marker tests to 67 for two more of our Brewer surname project participants, Gregory L. Brewer (kit 159573) and myself, Richard Brewer (kit 44994), both of whom, like William, belong to the subgroup of Brewers descended from Jan Brouwer. I also requested a SNP test including P78+ which came back from the lab substantiating my haplogroup assignment of I2b1c, the same as Arnold Hermans.
The results for my 67 marker test (kit 44994) shows the anomalous value of 7 at DYS565. Not entirely unexpected.
As I will show below, we have identified a unique familial mutation that occurred prior to the common ancestor of Jan Brouwer and Petrus Hermans some 400 to 700 years before the present. Depending on how rare it is, it has the possibility of representing a family defining mutation and can help us characterize the european geographic origins of Jan Brouwer (1632-1702). Understanding the rarity of this mutation allows us to define our own unique familial branch within the haplogroup I2b1c. I explore this in the following sections.
Investigating DYS565 and its statistics:
Beginning with Ken Nordtvedt’s data, the allele value DYS565 = 7 appears to be extremely rare. He writes, “In fact, there are only two known examples in tens of thousands of haplotypes. The odds of independent mutations to DYS565 = 7 is infinitesimal.” Accepting that Ken has only 2 haplotypes, out of say 10,000, in his data base, we would indeed be led to think that it is quite rare with its frequency given by p = x/N = 2/10,000 = 0.0002 = 0.02%
I now concur with Ken, based on the material presented below, but initially it bothered me that because Ken has gathered his data from various sites, and based upon the consensus allele range of 9-14 for DYS565, the various databases (such as Ysearch) simply do not report values less than 9 even if the Y-DNA 67 marker test results show, for example, a measured value of DYS565 = 7. Another problem is that if all the DYS565 =7 in the majority of the data bases are thrown out, the 10,000 in Ken’s database may not even be representative of the real total population.
The problem: Recall that there are only four types of base elements (nucleotides) used in constructing any DNA and they are designated A, C, G, and T. At the various locations on the Y- DNA the four elements are found to be arranged in a short (two to five letter) pattern. For example: ATAA. At the specific sites that are read by the lab, the short pattern is found to repeat itself a number of times, like a stutter: for example, ATAA...ATAA...ATAA...ATAA, illustrates four repeats of the ATAA sequence. The number of times the pattern is repeated in a row (in tandem) is the number reported by the lab. The number is called the STR ( for Short Tandem Repeat) or Allele value by the scientific community. I sought without success to locate any published paper or a National Institute of Standards and Testing (NIST) STR Fact Sheet detailing the original research supporting the observed, or consensus, range of allele repeats now quoted in print as 9 to14 with a mean of 11 or 12. In the absence of literature for DYS565 establishing the allele range, it is difficult to learn how the range was originally determined. What sample size and what different population samples were used to infer the reported allele frequency distribution that is now standardly reported and used in haplotype studies within various haplogroups. In the original work were any outlier values as low as 7 detected and rejected as too few to be included in the accepted range 9-14?
Non-published variant alleles are being observed on a regular basis as STR typing becomes more wide-spread. However, variant allele reports as of 04/29/2010 failed to include DYS565. This underreporting certainly skews any haplotype studies, implying a value of 7 is not only exceedingly rare, but in the case of the Y-Search database, it would presumably be non-existent. But, is it that rare? Even if the lowest marker is 9 at DYS565, how do we know whether the value reported by a particular lab for 565 should have been corrected by -2 (or -4 for a mean of 11)? Commercial testing companies follow different standards when determining marker values. Even the repeat sequence motif for DYS565 differs, being ATAA in one lab and TAAA in another, leading to differences in repeat counts.
The NIST lists the repeat sequence motif at locus DYS565 as [ATAA]12 , meaning 12 repeats of the sequence ATAA is the most common allele value. They specify the expected variation is 9 to 14. (ATAA is used by NIST -- whereas TAAA is used by some other labs, thus the count can be made as either ... /ATAA/ ATAA/ ATAA/A ... or ... A /TAAA/ TAAA/ TAAA/ ... depending on the lab). As I will discuss below, a count of 12 occurs most frequently in > 58% of the FTDNA samples measured.
It has also been found by FTDNA that the value of 9 occurs in < 1% of the population. A few persons tested, (<.03%), have only 8 repeats. Assuming what is commonly referred to as the STR stepwise mutation model, it seems likely that those persons with 8 have an ancestor that dropped one repeat, a mutation that was then passed on from an earlier ancestor that had 9, and that our own family ancestor may have either dropped one repeat from one of those persons carrying 8 repeats or, perhaps less likely, dropped two from one of the persons carrying 9 repeats.
As a result of the mutation, our family now has only 7 repeats at DYS565 with the sequence motif [ATAA]7 which is a circumstance that represents fewer than 0.01% of all those that have been tested for 67 markers. Our DYS565 locus the repeating section has something like:... /ATAA/ ATAA/ ATAA/ ATAA/ ATAA/ ATAA/ ATAA/... and then continues with the non repeating series of base elements (nucleotides).As we have found, this has became our personal family identifying allele value -- DYS565 = 7.
To make any judgement concerning the rarity of the frequency of 7 repeats, it is beneficial to use a consistent set of lab reports and a consistent database, in particular that provided by FTDNA. I queried both Bennett Greenspan of FTDNA and the University of Arizona Lab regarding the allele range for DYS565 and the frequency of occurrence of 7.In spite of the lack published documentation, as a next best thing I requested Bennett Greenspan to search his entire 38,000 participant database of 67 marker results for allele values reported for DYS565. He was kind enough to provide me with the following interesting results:
“The allele frequency in our database is this:
Allele value of:
7=.01
8=.03
9=.63
10= .19
11=35.37
12=58.07
13=5.56
14= .11
This is pretty solid data in as much as it is based on over 38,000 results. At my request, he further inspected the DYS565 = 7 results by hand, and, as would be expected from the above statistics, found that there were only 3 people in the entire system at FTDNA that have a value of DYS565 = 7 and all three appear to be haplogroup I2b1c. (that number excludes the results for me, which wasn’t available at the time of his search.)
Well, that is interesting! The statistics with a base of 38,000 makes the data on DYS565 more definitive than anything the originating lab could have produced (they are typically limited to maybe 500 cases of different population samples) and that makes my search for any original documentation on that marker somewhat moot.
Additional supporting information came from the query I sent to the lab at University of Arizona. In response to my questions they provided me with a report from Taylor Edwards at the lab about this marker:
"It looks like we have 2 cases in our database of a score of 7 for DYS565. This marker is not widely used and it does not have any validation against reference material (SRM 2395) in Butler's NIST database. Before we began using it, we of course validated the score against the original reference and the Genbank sequence. Since that time, we have observed several alleles outside of the published range (which is expected since until FTDNA, it had not been run on very many samples).
"When we first observe an allele outside the expected range, we always run it on both the A swab and the B swab. In the case of Brewer, his original A swab failed so we have only scored the sample on the B swab. In this case, the allele falls into an expected range and frequency for a standard STR stepwise model (e.g. we have observed a slightly larger number of 8's, 0.03%; even more 9's, 0.60 %; etc. forming a nice distribution curve). Thus, there is no reason to expect the allele to be abnormal. In cases where we observe a jump in the allelic distribution (e.g. if we were to observe a score of 5) then we typically sequence the allele to verify the mutation.
"Unfortunately, there is no published data on this (except a poster that we presented at the Human Identification conference in 2008). We have made this info available to the NIST, but FTDNA is always ahead of the primary literature since we are using more markers than Butler can keep up with the nomenclature."
FTDNA summarized the situation for me as follows: “While the marker is not widely used, this allele value is still relatively rare, so if two people share the allele value on this marker and are similar enough across the rest of the markers to share a common ancestor in a genealogical time frame, then they belong to the same family branch and everyone without the rare allele does not.”
Conclusion: We have found a unique allele value of 7 for DYS565 occurring in our family of Brewers descended from Jan Brouwer of Flatlands. The value defines the Jan Brouwer line. If any of you wish to verify your descent from Jan Brouwer, you can do so by testing the value of DYS565, or extending your 37 marker results to 67 which includes that marker. As we have seen, the typical range of values expected and reported in definitions of the DYS565 allele is 9 to 14, with a modal value of 12. Our measured value of 7 appears to be found in less than one one- hundredth of a percent ( 0.01%) of the population. With my own results, there are only 4 persons reporting a 7 in the entire FTDNA data base of 38,000 persons who have been tested at 67 markers.
Some Possible Consequences: I believe this mutation occurred in our ancestral line sometime around 1300 AD, possibly in the Netherlands. We know it occurred earlier (further back in time) than ~ 1600 AD because it occurred prior to the Jan Brouwer and Petrus Hermans common ancestor. Given the slow mutation rate of the DYS565 the Time to the Most Recent Common Ancestor (TMRCA) has been estimated by Ken Nordtvedt, “By so-called Genetic Distance count they (Brouwer and Hermans) are 8 mutations apart, and standard TMRCA estimates then place the most likely TMRCA as 720 years ago. But the statistical confidence interval on that estimate will be several centuries above or below. TMRCA modifications when taking into account the relationship of the two haplotypes being compared with the modal haplotype for the clade they are within, one would have to move that most likely TMRCA estimate toward the present --- less than 720 years ago.” 700 years ago was 1300 AD, a period before the use of surnames. Before 1500 AD communities were so small that everyone would know who “Johannes” was so it is unlikely that any of our early ancestors had any need for a surname to distinguish themselves. By 1500 AD the use of occupations for family names, such as Brauer (brewer), was more common in German speaking regions than in almost any other culture, and in the Netherlands the Dutch name would be Brouwer (from the noun ‘brewer’).
Today, most of the surnames ‘Hermans’ occur on the Netherlands/Belgium border. My recent search for the origins of Jan Brouwer place him in either Friesland or near the Netherlands/ Belgium border as well. This area includes Frisia which extends from northwestern Netherlands (including the Dutch province of Friesland) across northwestern Germany. However, to the extent that there remained an identifiable Frisian ethnic group still existing in the 1300’s -- after the influx of Romans and other peoples that followed them into Frisia Magna, the region originally settled and dominated by the Frisians -- it is probable that our common Brouwer/Herman ancestor was NOT Frisian. The Y-DNA haplogroup of the original ethnic people from Frisia is believed to be R1b1b2a1a. See,http://en.wikipedia.org/wiki/Frisians#Genetics_Y-DNA
Based on our haplogroup I2b1c, the Jan Brouwer/ Petrus Herman common 12th century ancestor is more likely to have been Saxon from eastern Netherlands and eastern Low Countries (Belgium and the Netherlands) and considered ethnically Dutch. Looking at wikipedia for our common haplogroup, I2b1c, we find it has the highest frequency in Germany, Netherlands, Denmark and England. The distributions of haplogroup I2b1 comprises less than 10% of the total Y-Chromosome diversity of all populations outside of Lower Saxony and seems to correlate fairly well with the extent of the historical influence of the Germanic peoples, being found in over 4% of the population in Germany, the Netherlands, Belgium and Denmark. I2b1 is further subdivided into 5 subgroups, one of which is our branch I2b1c and now we can divide that small clade into our Brouwer/Hermans branch defined by our STR DYS565 =7 mutation, a yet even smaller population.