4. ANCILLARY CALCULATIONS AND PLOTS

4.A. Two-nuclide diagrams

These are normalized two-nuclide diagrams where the X-axis is the concentration of the longer-half-life nuclide divided by its production rate, and the Y-axis is the ratio of the concentration of the shorter-half-life nuclide divided by its production rate to that of the longer-half-life nuclide. Thus, burial goes down in this diagram no matter what the two nuclides being plotted. For an overview of this type of diagram applied to 26Al and 10Be, see Granger (2006). 

Production rates used for normalization are total production rates in the sample, i.e. spallation and muon production. 

In the case where multiple measurements of the same nuclide have been entered for a particular sample, they are averaged (using an error-weighted mean) before plotting. Thus, each sample will generate one and only one ellipse, no matter how many measurements were made on that sample. 

The uncertainty ellipses (68% confidence) in this plot are computed using the 'ellipse.m' code from the 2008 online calculators, which is descended from the 'splot' Mathematica code written by John Stone in the 1990's. 

The "banana" or simple exposure region in the background (black lines) is bounded by the curves for simple exposure (top; assumes a single period of exposure at zero erosion) and steady erosion (bottom; assumes an infinitely long period of steady surface erosion). These are calculated assuming that all production is due to spallation, which is a slight oversimplification but irrelevant in nearly all situations. The main situation where it might be relevant is when rapid erosion rates and stable nuclides are involved, e.g., at the left-hand side of a 10Be/21Ne or 26Al/21Ne diagram. See Balco and Shuster (2009) for an example. This is a potential future improvement. 

Previous versions of the online exposure age calculator plotted 26Al/10Be diagrams with normalization using surface production rates calculated using the non-time-dependent `St` scaling method only. Because different scaling methods predict different production rates at the site, of course, they also predict different values for the normalized nuclide concentrations that one would plot in this diagram. This by itself is not an obstacle to generating two-nuclide diagrams with scaling methods other than `St,` but of course the other problem is that time-dependent scaling methods predict that the production rate is changing over time, which leads to the question of which value of the production rate one should use for normalization. I deal with this by assuming that for the measured nuclide concentration, some given value of the erosion rate, and the exposure age calculated for a particular scaling method, there is some value of the production rate P such that the simple exposure age equation: 

is valid. Basically, this is an effective decay-weighted production rate for the period during which the sample was exposed. Then I divide by this effective value of P to get the normalized nuclide concentration. This is slightly different mathematically for the case of zero erosion and a stable nuclide (the simple exposure age equation is just N = Pt in this case), but the reasoning is the same. The point is that each scaling method generates a slightly different effective production rate, which in turn means a slightly different normalized nuclide concentration, which in turn means a slightly different ellipse.

Note that this has to be handled differently in the case where the measured nuclide concentration is above saturation for a particular scaling method (because there's no value of t). In this case the effective production rate is the similarly weighted average for the entire duration of the forward calculation (usually 8 effective half-lives). Note that I am not sure if this calculation is always robust for potentially pathological situations in which the concentration of a stable nuclide is above the saturation value predicted for an assumed erosion rate. 

Using this procedure, one can plot ellipses on the 2-nuclide diagram for all scaling methods (St is red; Lm is green; LSDn is blue). This makes an important difference in many cases, probably the most prominent one being high-elevation, very-low-erosion-rate surfaces in Antarctica. See, for example, discussion here

This procedure does perpetuate a small inconsistency between the plotting of the ellipses and the plotting of the simple exposure region, because the simple exposure region was computed under the assumption that the production rate is not changing over time. This has a very small effect on the position of the steady erosion line and it's not dealt with here.

Note also that you can potentially mess this up with a non-default production rate calibration data set. This is because you can only calibrate one nuclide at a time...so, for example, you can change the Be-10 production rate without changing the Al-26 production rate, which has the effect of changing the 26/10 ratio used in computing the normalized nuclide concentrations, which changes the position of the ellipse. You might want to be careful with this.

4.B. He-3 retention diagrams

In the case where 3He is measured in quartz with another nuclide, 3He retention diagrams, as are shown in, for example, Tremblay and others (2014a,b), are drawn. For a discussion of why you might want to do this, see those papers. 

Formally the retention diagram has the same coordinate space as a normal multiple-nuclide diagrams, except that because the goal is to quantify diffusive loss of 3He in quartz, 3He is always considered to be the shorter-half-life nuclide. This decision can generate strange results when the other nuclide is 14C (because lacking advance knowledge of the exposure history of the sample, there is really no way to know whether we are computing 3He retention or a 14C/3He burial age). So the x-axis is the production-rate-normalized concentration of the nuclide that is not 3He, and the y-axis is the production-rate-normalized ratio of the concentration of 3He to that of the other nuclide. 

The ellipses are plotted using the same code as in other multiple-nuclide diagrams. Like for other multiple-nuclide data, in the case where multiple measurements of the same nuclide have been entered for a particular sample, they are averaged (using an error-weighted mean) before plotting. Thus, each sample will generate one and only one ellipse, no matter how many measurements were made on that sample. 

The simple exposure region that would be shown in a typical two-nuclide plot is not shown in this diagram. Instead, isothermal retention curves for various temperatures are shown. These are computed using the equations from Wolf and Farley (1998), the diffusion kinetics measured by Shuster and Farley (2005), a grain diameter of 0.5 mm, and temperatures of -40, -30, -20, -10, and 0 C. Note that these diffusion kinetics are probably unrealistic (see the Tremblay papers), so the retention curves are mainly useful just in highlighting what the shape of a typical isothermal retention curve looks like. 

4.C. Summary exposure-age statistics for samples from a single landform

The checkbox on the exposure-age input page triggers some code that attempts to do summary statistics on the set of exposure-ages being submitted. The idea here is that you are submitting a batch of ages from the same landform, and you want the age of the landform. What happens here is as follows:

1. Compute basic statistics (mean and standard deviation, error-weighted mean and uncertainty, chi-squared and p-value) on the entire data set.

2. Attempt to remove outliers. First, decide if there are any outliers by computing the p-value of the chi-squared statistic with respect to the mean, using measurement uncertainties. If it's better than 0.01 (e.g., you can't exclude the hypothesis that the data belong to a single population at 99% confidence), stop. If it's not, remove the measurement that is farthest from the mean in relation to its measurement uncertainty (that is, the one with the worst chi-squared), and recompute the p-value. If it's now better than 0.01, stop. Repeat as necessary, but stop if (i) there are less than 3 data remaining, or (ii) you've discarded half the data. This procedure is not really found in statistics textbooks per se, but is loosely based on Chapter 11 of Bevington and Robinson. 

3. Take the remaining data with outliers removed (the "pruned" data set) and compute various statistics. If the p-value of the chi-squared for the pruned data set is > 0.05 (you can't exclude the hypothesis that the data belong to a single population at 95% confidence), then report the error-weighted mean and standard error as the best estimate of the landform age. If it's < 0.05, then report the mean and standard deviation. 

Note that all the computations so far have just relied on the internal uncertainties (that is, measurement uncertainty only) to compute the chi-squareds and p-values. Thus, the uncertainty reported in step 3 above can be thought of as an internal uncertainty for the landform. A corresponding external uncertainty is then also computed by propagating the production rate uncertainty into this value. The external uncertainty appears in parentheses in the results. For example, "Summary value is 11807 +/- 299 (1064)" means the internal uncertainty for the landform age estimate is 299 years, and the external uncertainty is 1064 years. 

Note that this algorithm is kind of ad hoc  -- I have no idea if the pruning algorithm based on the chi-squared test has been formalized anywhere, although it seems pretty obvious so it probably has. In addition, the various confidence thresholds are somewhat arbitrary. The aim is simply to have a reasonably sensible approach to discarding outliers, and this algorithm does a pretty good job of accomplishing this in most cases where there is a single population with a couple of outliers.

Entering measurements of multiple nuclides on the set of samples triggers summary statistics for each nuclide and scaling method separately, as well as aggregated results for each scaling method. 

Finally, the output page displays normal kernel density plots ("camel diagrams") for aggregated results for each scaling method. These show the total density curve for the entire data set (light line) and the pruned data set (dark line). Note, again, that you may create an inconsistency here if you've used your own calibration data for one of the nuclides, but not the other (because at present you can only do a calibration with one nuclide at a time). For anything involving comparison between multiple nuclides, you should probably start with the default production rate calibration.

Note that the code that generates the camel diagrams uses a fancy age-dependent normalization scheme to correct the problem with "normal" camel diagrams that measurements with the same relative precision appear less important at older ages. In most cases (because for this application one would expect all the samples to have about the same age), this will be irrelevant as a practical matter. 

4.D. Summary statistics for production rate calibration 

When estimating a best estimate of the calibrated production rate parameter (either the reference production rate for St and Lm, or the correction factor for LSDn scaling), a heuristic description of the basic principles that I am trying to implement are that (i) we should distrust reported measurement uncertainties, and (ii) we are trying to estimate the uncertainty in scaling between different locations, not the uncertainty in measuring the production rate at a single location.

Item (i) (distrust the measurement uncertainties) comes from the observations from various parts of the CRONUS project (the intercomparison study of Jull et al. 2016 and the calibration exercise of Borchers et al.. 2016) that reported measurement uncertainties for cosmogenic-nuclide measurements underestimate true measurement uncertainties in an interlaboratory-comparison sense. That is, if two labs are reporting 3% uncertainties for Be-10 measurements, they probably won't agree at 3%. In fact, most labs can't even get their own 3% measurements to agree within 3%. What this observation means is that when we are combining data sets that may have been run at different times and/or in different labs (which is generally the case when considering calibration data from multiple sites), we should not use any fitting or averaging algorithm that weights measurements according to their measurement uncertainty. So any optimization approach for finding the value of the reference production rate that minimizes the chi-squared or MSWD based on measurement uncertainties is out. Note that Borchers et al. did use an optimization method of this sort, but they addressed this issue by assuming that all measurements had the same measurement uncertainty. A number of other papers (including mine, e.g. Balco et al., 2009) did use optimization methods that used measurement uncertainties for the weighting, but I am now arguing that that is not the right way to do it.

The important implication of item (ii) (focus on the inter-site scatter) is that we should make sure that all sites should count equally in computing a best-estimate production rate. We don't want to give a site more weight just because there are more measurements from that site. 

With that in mind, the procedure for computing a best estimate of the reference production rate (we'll just say "reference production rate" even though for LSDn scaling this is a nondimensional correction factor instead) is as follows. 

First, compute the reference production rate implied by each measurement in the calibration data set. Average these at each site. Then, take the average of the site averages. Done. 

Note that this is the same method used in Balco et al. (2008). Following the discussion above, it has the features that (i) measurement uncertainties don't count, and (ii) each site contributes equally regardless of the number of samples at each site. It also has the virtue of being extremely simple. 

This does lead to the question of how to compute the uncertainty in the reference production rate. I am actually not sure what the best way to do this is, so here I use the simplest approach I can think of. First, compute the standard deviation of the individual production rate estimates from all the measurements with regard to the best-estimate value (specifically this is the standard deviation of P/Pbar, where P is each individual measurement of P, and Pbar is the best estimate derived two paragraphs ago). This is reported as "uncertainty by total scatter" on the results page. Then, compute the standard deviation of the site averages with respect to the best estimate. This is reported as "uncertainty by site-to-site scatter." Finally, take the larger of these two numbers as the best estimate of the uncertainty for use in subsequent exposure age calculations.

A possible improvement to this approach that could be made in future would take note of the fact that the scatter metric used above is due to both scaling uncertainty and measurement uncertainty, and try to identify only the portion that is due to measurement uncertainty. As currently implemented this approach probably overestimates uncertainties in downstream exposure ages slightly. 

Note that this uncertainty estimate will not behave as designed in the case where you have input a calibration data set consisting only of closely grouped measurements from a single site. In this case, obviously, one cannot compute the site-to-site scatter, so the summary uncertainty will be the standard deviation of measurements from the single site. This is not likely to capture real scaling uncertainty in the production rate estimates (although it might by accident), thus leading to a likely underestimate of the external uncertainty in exposure ages computed subsequently. 

Note the special case when only one sample has been input. In this case the production rate uncertainty is just derived from the measurement uncertainty for the sample. 

Finally, as it is possible to enter calibration data that have minimum and maximum age constraints rather than precise independent ages, a method for incorporating the resulting maximum and minimum constraints on the production rate is needed. The algorithm for doing this assumes that each site can either be associated with (i) an independent age or (ii) minimum and/or maximum age constraints, but not both. Note that the ICE-D calibration data server will honor this rule, but it is probably possible to cause this code to fail if you intentionally break it in data entry. You can also cause the code to crash if you enter impossible data sets, for example a data set consisting only of minimum age constraints. Then, computing site-averaged production rates yields a set of site averages with exact, minimum, and maximum estimates of the production rate. First, compute the average of all the site averages from sites with exact estimates. If the resulting value does not exceed any of the minimum or maximum constraints, stop. If it does, recompute the average with the active constraints included. If no more constraints are exceeded, stop. Continue this loop until either no new constraints are exceeded in a step, or all constraints are included in the average. Then the uncertainties are computed as described above on the set of active constraints.