Rox2's Estimated Age

The following dates are estimated using currently available data.  The dates are not set in stone and can alter slightly as each new yDNA result is added to the calculations.  All estimates, including those from various yDNA companies, are painted with a very broad brush. 

In 2011 Didier Vernade's 'Handy Method' TMRCA (Time to Most Recent Common Ancestor) calculation was used to estimate the age of Rox2's common ancestor.  It was a dating method he outlined on the now defunct DNA Forums.  The calculation is based around the estimated age of the large well known M222 subclade (then thought to be 1600 years).  Following Didier's instructions 71 of the 67 marker haplotypes that then matched around Rox2's framework of key markers were used.  'Jumpy' multicopy markers were removed to leave 42 markers.  There were some identical matches with others with the same surnames.  In those cases only one example was used.

Charting an average of the differences between Rox2 haplotypes produced a good bell curve and gave some confidence that the key STR markers that help define the cluster worked.   The average difference was 2.63 from the base haplotype.  

2.63 * 1600/3.5 = 1202 years to common ancestor.

1202 years before the present year is roughly 800AD.

If the members of the cluster were not descended from the same founder (a monophyletic clade) the resulting bell curve would not be smooth.  There were no subclade-defining SNPs known for Rox2 at that time.  The new technology of NGS testing in 2014 confirmed that Rox2 was indeed a monophyletic subclade descended from a founding event.

MOST RECENT COMMON ANCESTOR (MRCA) ESTIMATES

Ken Nordtvedt's 'Generations111T' (link) TMRCA calculator handles 111 marker haplotypes, the highest resolution standard test available from FTDNA.  The first MRCA estimate, below, was made using all unambiguous 111 marker Rox2 matches.  30 years is the recommended generation length for Generations111T.  2000 AD is the date the TMCRA is subtracted from.  Given the high number of STR matches in the databases it is a useful tool because exactly the same 'apples to apples' data (111 STRs) are used to produce the dates, unlike most SNP-based estimates where the resolution and coverage can vary.

All:

'GA' = 748 AD  (founder)

'GA coal' = 841 AD (expansion)

Using only 111 marker matches with different surnames (fewer recently related haplotypes in the estimate):

'GA' = 710 AD  (founder)

'GA coal' = 806 AD (expansion)

The coalescence date 'GA coal' represents the approximate date of the most significant expansion of the subclade - the time when multiple descendants are having lots of children of their own.  For Rox2 that expansion looks to be about one hundred years after the haplogroup founder's birth.  GA coal is the approximate time that most present day descendants trace back to - the time that the parallel 'brother' lineages are having children.  Ken Nordtvedt stated:

Coalescence age is the tmrca "averaged" over all the pairs of haplotypes you can take from the N haplotypes.

In the early years of research as each new result came in the Rox2 'GA' founder date hovered between 750 AD and 850 AD.  Between the years 2010 and 2020 with increasing numbers of 111 STR tests the estimate settled down and deviated within a much smaller range of around one decade either side of the year 750 AD.  Recently (after March 2020) a trend in slightly earlier age estimates (c. 700 AD) is observable.  Time and hopefully more 111 STR matches will tell if this more recent trend holds up.  Broadly speaking it is likely that the Rox2 haplogroup represented by the shared SNP block was founded in the Early Middle Ages (5th century AD - 11th century AD).

FGC11414>BY21590 contains testers whose earliest known origins are in northern Sweden, northern Finland and northern Norway.  All share a pattern of 'family' off-modal STR markers in common and probably descend from one man who became the ancestor of many modern Scandinavian men.  May 2023 MRCA estimates made using Ken Nordtvedt's 111T program for those kits with 111 STR markers are 1305 AD (founder) and 1362 AD (expansion).  An early branch point evident in STR analysis was confirmed (5/3/2020) with a new Big Y result.  So far there look to have been two medieval brothers - sons of Mr BY21590, i.e. FT249214 and BY43192.  The majority of present-day Sweden Rox2 so far discovered currently descend from BY43192.  His descendant, a mid-sixteenth century founder represented by BY43192 and two equivalent SNPs, currently appears to have had five sons whose descendants live in northern Sweden (clustered around Skellefteå).  

The usual generous margins of error must be applied and as ever the full cross-section of Big Y results shared on Big Tree produces more accurate age estimates.  Big Y results can be added to the Big Tree by following these instructions.  It appears that the sons of BY21590 were born in, or moved to, Scandinavia some time in around 1300 AD (wide margin of error).  There is no solid evidence as to where BY21590 was born and so far the only place that parallel downstream subclades of FGC11414>BY21590 are found is Scandinavia.  No SNPs upstream of BY21590 have been found in British or Irish-origin men so far except for FGC11414 itself.  FGC11414 is the 'father' of BY21590 and several other branches.  FGC11414 is a 'son' of Mr. Rox2, the founder of the haplogroupParallel subclades downstream of FGC11414 (the other 'sons' of Mr FGC11414) have representatives whose more recent family origins (paper trails) are from northern England.  As mentioned on the Home Page, there is no firm evidence yet as to where Mr. Rox2 was born either - Rox2 itself appears to be the result of a similar sudden founding event by one man that happened about five hundred years earlier than that of BY21590 in Sweden. 

FTDNA DISCOVER (BETA)

FTDNA released a new TMRCA feature for their Big Y test in July 2022, a Beta version of a tool called DiscoverDiscover is a welcome and useful feature - the more easily searchable TMRCA estimates and extra information the better.  The eight known Rox2 branches are listed plus there is an additional FGC11397* paragroup that is occasionally used when a new kit is awaiting manual placement by FTDNA.  A large margin of error (Confidence Interval) is attached to Discover estimates, as with all TMRCA estimates.  The age estimate for Rox2 (R-FGC11397) generated by FTDNA Discover (Beta 1) was initially a young-looking 1000 AD, plus or minus 200 years.  Discover was updated in September 2022, apparently using a 'Relaxed Clock' model rather than Beta 1's 'Strict Clock', and the Beta 2 estimate for the founding of Rox2 became 600 years older at 400 AD plus or minus 300 years.  In October 2022 the Beta 3 estimate was 450 AD.

The formation dates of prehistoric subclades in Discover (Beta 1) were younger than known carbon dated ancient DNA from archaeological digs, see the Ancient DF27 page.  The oldest R1b-DF27 burial, GBVPK: France, Grotte Basse de la Vigne Perdue, is radio carbon dated to c. 2380 BC.  Discover (Beta 1) dated DF27 to c. 2100 BC.  After the September 2022 update the age of DF27 became 2600 BC, plus or minus 700 years.

Idiosyncratic mutation rates in branches can affect results when automated TMRCA estimation algorithms are used, as can a mix of higher and lower-resolution Big Y test results.  FTDNA call this 'differences in stem lengths', link.  Early STR analysis of Rox2 branch FT171815 revealed its much higher STR genetic distance from other Rox2 branches and Big Y also produces many more SNPs for this branch than the others.  BY168115 has fewer SNPs than average.  For a large subclade with many branches like Rox2 the two extremes take up position on the outer edges either side of the bell curve.  An estimated age of Rox2 as a whole requires an average SNP and/or STR count of all eight branches.  FT171815 is a 'son' of Rox2 like the seven other branches so it has to have the same age despite its above average STR GD from the other seven branches and its higher SNP count.  It appears that Discovery (Beta2) may be using FT171815, the branch with the longest 'stem length' and the most numerous mutation, for the TMRCA estimate for FGC11397.  Most of the other seven other branches have much lower SNP and STR counts.

The update in September 2022 has brought the FTDNA Discover (Beta 2) dates for the whole Rox2 haplogroup closer YFull's estimates.  The 'stretching' back of the earlier end of the timescale to accommodate the age of SNPs found in radiocarbon dated bones by archaeogeneticists, and the Relaxed Clock model employed to accommodate a young-looking subclade's paper trail (R-S781), appears to have resulted in the Discover (Beta 2) TMRCA estimates becoming older with an accompanying wider margin of error.  If the 600 year difference between the Beta 1 age estimate (c. 1000 AD) and the Beta 2 estimate (c. 400 AD) is divided by two the date arrived at is c. 700 ADAs Discover is still in the Beta stage it will probably continue to be updated and modified for accuracy for some time.

SNP COUNTING

New T2T analysis is being applied to results.  The Human Genome project that appeared in 2003 used technology that left areas of the y chromosome unmapped.  T2T is expected to address these issues - issues that led to large parts of the DF27 haplogroup being unrecognized by yDNA testing.  The new SNPs that are being discovered are labelled 'FTT' on FTDNA's Haplotree.  This sounds likely to be a long and complex process.  It is possible that new SNPs will be discovered for Rox2 - ones that might change the structure and branching of the current phylogentic tree.

As mentioned, it is difficult to be precise with SNP-derived age estimates - it is a random/stochastic process.  Also, Big Y-500 reads fewer Rox2 SNPs than FGC Elite and Big Y-700 tests and different types of tests are present in the same phylogenetic trees.  Some SNPs are missing or are labelled 'rejected' in returned .vcf results.  Some SNPs occur on 'jumpy' areas of the y chromosome - areas the technology has difficulty reading.  A SNP missing from one .vcf/.bed file but present in another Rox2's .vcf/.bed file might later only be found by those with the ability (and the software and hardware) to analyse the large .bam files.  There are INDELS contained in the results and some of those are less stable than others.  Some researchers count them, some don't.  Random distribution means that occasionally some branches can naturally experience several SNPs in a short space of time, while others might see fewer SNPs over a longer period of time.  That is something that is clearly evident within Rox2.  Two parallel subclades can have quite different SNP rates, but when it comes to age estimates on public phlyogenetic trees one formula is often applied to all.  Different phylogenetic trees now have different data sets - not all Rox2 matches are present in every tree and not all comparisons are straightforward 'apples-to-apples' comparisons.

FTDNA upgraded Big Y-500 to Big Y-700 in late-2018.  See Big Y-700 White Paper from FTDNA.  The new Big Y-700 tests have similar resolution and coverage to the FGC tests that helped first identify Rox2's SNPs back in 2014.  Future Rox2 matches may find they have more Unnamed Variants/private/family SNPs and Non-Matching Variants than existing Big Y-500 kits.  The average SNP-per-generation rate for Big Y-700 tests is around one SNP every 2-3 generations, rather than one SNP per 3-4 generations for Big Y-500.

With STR analysis we are comparing many more kits (over twice as many as Big Y) and unlike SNP tests those kits are the same 'apples' in an  'apples to apples' comparison - all the individuals have the same data (111 STRs).

There is a block of around 45 phylogenetically equivalent SNPs below DF27>Z2571>FGC11380 shared between the two Full Genomes Rox2 SNP tests of Jim Turner (N3036) and Robert Dickinson (134765). FTDNA's Big Y-500 reads less than half of those SNPs.

Confidence in SNP-derived age estimates (SNP counting) will be helped for haplogroups like Rox2 when NGS tests have the same depth of coverage - with a standard calibration - delivered with no ambiguity as to whether a SNP/INDEL is reliably called by the technology used for the test.  As it is different coverage levels, methods and formulas get used in TMRCA estimates in an effort to deal with the above-mentioned inconsistencies.  Timescales are stretched or contracted, SNPs don't happen like clockwork.  In any group of related kits there may be a few 'outliers' with unusually high or low SNP counts.  There are some in Rox2.  As far as age estimates go, the more kits used in the estimate the better - in order to average out inconsistencies.  Different phylogenetic trees have different numbers of NGS kits available to use in their estimates.  Differences in sample size, as well as in calculation methods, can affect each individual company's age estimates.

N.B.  YFull might have placed two members of the unrelated L21>DF13 'The Little Scottish Cluster' in the Rox2 part of their phylogentic tree.  One of the SNPs, FGC14577, is found in a member of that cluster - it is not one of Rox2's SNPs (October, 2021).

YFull's TMRCA for Rox2 (their R-Y8397) is around 300 years older (c. 1550 ybp, October 2018) than STR-based TMRCA estimates made using more Rox2 kits with 111 STRs (c. 1250 ybp).  The term 'ybp' means years before present.  At YFull, the 'formation' date is the date the subclade first began to diverge from a founder.  YFull's 'TMRCA' could be their estimated time that most present-day lineages trace back to - when 'brother' branches form and produce lines that endure to the present - so possibly the same as Ken Nordtvedt's 'coalescence' date, above.  Good STR-based age estimates require a lot of data. 

My STR estimated dates, above, are still comfortably within YFull's wide margin of error.  YFull apply their own formula to TMRCA estimates in an attempt to handle SNP inconsistency - like scaling the number of mutations with a test's coverage (FGC Elite found twice as many SNPs in the shared block than Big Y-500 did for Rox2).  They check specific regions of the chromosome for SNPs shared in common by FGC and Big Y-500 tests, the combBED regions.  There are FGC and Big Y kits mixed together in the Rox2 estimates at YFull and this could be affecting the YFull and FTDNA age estimates.  The YFull formula appears to artificially increase the numbers of SNPs beyond the number that are actually present Big Y-500 tests.  YFull date DF27 and Z2571 at 4500 ybp.  Again, this can vary.  [Edit Oct 2016: 4600 ybp.  Edit March 2017: 4400 ybp.  Edit July 2017: 'formed 4400 ybp, TMRCA 4300 ybp'.  Edit August 2018, formed/TMRCA 4500 ybp.  Edit March 2019, formed 4700 ybp, TMRCA 4200 ybp].  It feels as if their formula may 'compress' TMRCA estimates at the edges - making young  groups like Rox2 appear older than they are while also making the 'birth' of DF27 more recent.  Formulas are based on the entire database, not individual subclades.  The YFull estimates are an extremely useful reference but, like all estimates, are 'best guesses' and do not represent actual dates. link1, link2  Indeed, ancient yDNA from burials studied in Dynamic changes in genomic and social structures in third millennium BCE central Europe, Luka Papa et al., 2021 showed that R1b-U106 was probably born in around 3000 BC and R1b-L151, the ancestor of R1b-P312, possibly originated c. 3100 BC.  By counting SNPs for the two early Rox2 NGS tests in 2014 I had estimated that the age of DF27 was c. 3000 BC, see below.

Prior to NGS test results in 2014, no relevant SNPs had ever turned up below DF27 in commercially available testing for Rox2.  Incongruity in the mainly Ireland/UK origin American hobbyist databases, as well as having undergone a lengthy bottleneck may have led to Rox2's 'SNP invisibility' in pre-NGS (2014) testing.  There was difficulty getting accurate subclade identification for any of the very large ZZ12 section of DF27 before 2014.  Roughly three quarters of the way through the ongoing lifespan of the Rox2 yDNA lineage there appears to have been a founder effect - a rapid increase in the number of surviving male descendants of one man bearing the 'Rox2' yDNA markers.

There looked to be an average of around 15 'private' FGC Elite tested SNPs that have occurred after the time of the Rox2 founder - represented by the block of several dozen phylogenetic equivalent SNPs.  Big Y-700 appears to give a similar number.  FTDNA's Big Y-500 highlights approximately 12 '+' private SNPs on average.  As mentioned, SNPs are random and aren't ticking away like clockwork so some kits in the same subclade can have a lot more SNPs than others.  The SNP numbers from multiple descendants of one known lineage can be averaged.  If a SNP occurs every 75 to 90 years (as suggested for the FGC/Big Y-700 test), then SNP counting using FGC/Big Y-700 data currently indicates the same age for Rox2 as that arrived at by using estimates made with 111 STR results.  A 2019 comparison of SNP and STR-based age estimates concluded that both methods produced similar results with the same margin of error of +/- 20-30%.  

A SNP in the FGC test might have occurred every 83.3 years (1250/15=83.3).  Big Y-700 kits will probably share a similar SNP rate per generation.  In all, there look to be around 60 Rox2 SNPs that trace back to the DF27 founder from the present day.  At 83.3 years-per-SNP, that equates to DF27 being about 5000 years old.

On average Big Y-500 hg19 Rox2 kits (apples-to-apples) looked to have experienced one 'private' or 'novel' SNP every 100 years, or roughly every 4 generations (25-30 years-per-generation).  New (2018) hg38 Big Y-500 kits got more SNPs (approaching the mid-90s) years-per-SNP rate as hg19 FGC Rox2 kits.  Big Y-700 kits look to be in the early 80s years-per-SNP, like FGC Elite kits.  The Rox2* paragroup consisting of three hg38 Big Y-500 (BigY2) kits with parallel ancestry since the Early Middle Ages averaged out at 96 years-per-SNP at the Big Tree (October 2019).  All estimates have the usual generous margin of error.