Quality Assurance

Introduction

Now that you have a spreadsheet full of data and you're generally comfortable with your model performance, you're going to want to do some additional QA / QC. The last thing that you want to do is publish a paper talking about your "amazing" data when that data is of questionable quality.

First, you'll want to make sure that you have the correct types of data at your disposal. I recommend that you output the following data during batch processing: (1) spectrum throughput, (2) peak area, (3) peak standard deviation, and (4) peak chi-squared. You probably won't need or want all of those values, particularly after you do an initial assessment, but it's always easier to discard data than it is to go back and collect it later. At a minimum, you should collect throughput, peak area, and peak standard deviation. If you're missing one of these pieces of data, I strongly recommend that you modify bAxilBatch and reprocess your data (see data processing).

Spectrum Throughput

Spectrum throughput will tell you how many x-rays the detector was receiving per second. You should see high numbers for this, somewhere in the 100-300k range if you're using a standard spot size (~1cm by 1cm) and typical excitation conditions. Within a single pass over a core section (one excitation condition), you should see relatively consistent or gradually changing throughput values. Abrupt drops in throughput usually indicate that you got a poor measurement due to not landing on the surface or landing on a crack (Figure 1). You'll want to flag these points and consider discarding them from your final interpretations.

Plots showing the variations in throughput and argon peak intensity for a real core, highlighting the relationship between high argon values and relatively low throughput

Figure 1: Throughput (left) and Ar Kα (right) for the 367-U1499B-30R core, with moving averages plotted against the data. Also shown is a plot of the throughput vs Ar Kα values (center), where each value is corrected by the relevant moving average. You can see that points with high Ar Kα values also have relatively low throughput values, suggesting both are due to not landing fully on the sample surface or landing on cracks.

Argon Counts

While throughput can give you a lot of information about the quality of a data point, the argon peak can give you even more. If you observe a non-negative argon peak (the origin of negative peaks will be explained in the next section), then you measured some air. If the argon values are high for all data points in a section, then there might have been a hole in the film. Otherwise, points with high argon values represent locations where the prism didn't land firmly on the sample, the sample was cracked, or the sample had air-filled porosity. Typically, you can screen out bad points by using a combination of the argon values and the throughput values (Figure 1).

Peak Area

Next, you'll want to look at the peak areas for the rest of the elements. You should look for values that don't make sense, given the context and what you know about the compositions of your materials. The intensity of a peak is only partially related to the concentration of the respective element, and the ratio of two peak intensities cannot be used to quantify the ratio of element concentrations (without substantial correction). However, when you are comparing the intensities of two peaks that are close in energy and that are derived from elements with similar atomic numbers, you can say that the ratios of the peaks offer reasonable first approximations for the underlying element ratios (be careful not to take this too far!). For example, if you have a 10:1 Fe Kα : Mn Kα ratio, you can't say you have a 10:1 Fe:Mn element ratio. But, you can say that, because the peaks are similar in origin (Kα) and similar in energy (~5.9 vs 6.4 keV), and because the elements are similar in atomic number (z = 25 vs 26), you have much more Fe than Mn. If you see peak values that are out of place or of suspicious magnitude, you may have a processing error. Open up a spectrum file in bAxil and see what the model is doing (see data processing). If you're not certain what's causing the strange values, talk to the Lab Manager.

Finally, you may occasionally see peaks with negative values. Although we know that, theoretically, a peak should not have a negative value, bAxil and other spectrum processing packages do not. Negative peaks can occur when there are two or more peaks overlapping in the same part of the spectrum and at least one of those peaks is over-fit. If the processing software is told that there should be two peaks in an area, it will try to fit two peaks. While in the process of fitting a peak (or peaks) in one part of the spectrum (area A), the software may over-fit in another part (area B). The overall error may go down by getting a better fit in area A, despite a poor fit in area B. The negative peak happens when there is another peak that needs to be found in area B. If the new peak is defined as the difference between the observed spectrum and spectrum as already modeled, then the new peak will be negative. See Figure 2 for a common real-life example. In these cases, we recommend that you either modify your model or simply assume the negative peak is below detection.

Screenshot showing how a sum peak can over-fit part of the spectrum and result in a negative characteristic peak

Figure 2: Screenshot showing how small peaks that have significant interference can be modeled as negative

Peak Uncertainty

Once you're comfortable with your peak areas, you're going to want to consider the uncertainties in the values. This step is important, but it's also typically ignored by many scientists who use the XRF lab. There are two general approaches to considering the uncertainties: (1) assume each value is a "true" and only consider the uncertainty in the context of down-hole variations (i.e., assume point-to-point "wiggles" are due to noise), and (2) assume each value is uncertain and use the uncertainties directly in your plots, ratios, etc. There are advantages and disadvantages to each approach, but I'm going to briefly talk about how to use the second method.

Your peak is notionally defined as the total number of counts minus the background over some channel interval (Eq 1). There are uncertainties in both the background and the total which lead to uncertainties in the net peak (Eq 2 and Figure 3). Those uncertainties come in the form of modeling errors, counting errors, and interpretation errors, and they all combine to reduce our confidence in the actual value. If we know the total uncertainty, we can express our peak area as a confidence interval or even just as a range (Eq 3).

Definitions

P = net peak area; T = total area; B = background area; σ = standard deviation; ρ = correlation coefficient

Eq1: P = T – B

Eq2: σP2 = σT2 + σB2 – 2ρTBσTσB

Eq3: C.I. = P ± 2σP

Plots showing how the uncertainties in the modeled background and modeled total result in uncertainties in the net peak area

Figure 3: Plot (left) showing the distributions of values for the two measured properties - total signal and background compared, next to a plot (right) showing the distribution of the difference between the two values (net peak area). Curves are based on actual values, assuming uncertainties for both background and total signal are exclusively due to counting errors.

So, how do we quantify that uncertainty? One of the most conventional numbers for expressing error is the standard deviation. The standard deviation is calculated by comparing the results of repeated measurements on the same system. We can't actually calculate the error when we process a single spectrum, because we're not making repeated measurements. In order to truly capture the uncertainties, we'd need to: (1) make repeated measurements on the same spot, and (2) process the spectra multiple times using different assumptions. While I recommend that everyone try this approach on at least one section (hopefully an important one), the process is time consuming and it isn't typically done in this lab.

If you only measure once and process once, then you'll need to rely on the standard deviation values reported by bAxil. The bAxil help menus and documents aren't very explicit about the formula used, but it appears that the software only considers uncertainties in the x-ray counts. We can easily prove our assumption by looking at the data.

In spectroscopy, counts are typically treated as if they follow a Poisson distribution [1]. The standard deviation of a Poisson distribution is the square root of the mean, or in this case, the total number of counts (Eq 4) [2]. If we ignore all of the systematic errors in the peak and background (not recommended), then we only have to consider uncertainties in the counts. Therefore, we can assume that there's no correlation between the total and the background (Eq 5) and we can express the uncertainty of each term as the uncertainty in the counts (Eq 6). Plotting the peak standard deviation against the square root of the background plus total gives us a near 1:1 line (Figure 4), supporting the assertion that bAxil only considers counting errors.

Derivation of Peak Standard Deviation

Eq4: σN = √N

Eq5: σP2 = σT2 + σB2

Eq6: σP = √ (T + B)

Plots showing the 1:1 relationship between the bAxil standard deviation and the expected value from counting statistics (left), next to a plot showing the positive relationship between chi-squared and peak area (right)

Figure 4: Plot (left) demonstrating that the peak standard deviation from bAxilBatch is exclusively derived from counting statistics, next to a plot (right) showing that the peak chi-squared generally increases with peak area. The positive relationship between chi-squared and peak area means that the absolute value should not be used solely as an indicator of fit. See text for details.

Unfortunately, we know that the uncertainties in our peaks are not solely due to counting errors. There are systematic errors in the background related to model fit, and there are systematic errors in the actual peak curves. If you zoom in on any spectrum that's been processed using bAxil (or another program), you'll see plenty of areas where systematic errors in the background lead to artificial highs that are treated as peaks (Figure 5). You'll also see plenty of areas where peaks are poorly fit to the spectrum, meaning that the peak area isn't simply the difference between the measured spectrum and the background. The peak area is really the difference between one model (the peak) and another model (the background). So, you should remember that the actual uncertainty in a peak might be much, much larger than the uncertainty due to counting errors!

Screenshots showing how the background can under-fit the spectrum leading to areas that allow larger peaks to be fit

Figure 5: Screenshots showing how the modeled background tends to fit below bumps in the spectrum, causing a bias towards the preservation of local highs but not local lows. Peaks are only fit to those highs, meaning that in areas where the background is poorly modeled (fits too low), peaks will appear larger than they should be.

We also have chi-squared values to help us quantify our peak uncertainties. Chi-squared is a measurement of the difference between the modeled spectrum (background + peak) and the actual spectrum. Smaller numbers mean a better fit. In fact, the chi-squared value for the entire spectrum is what gets minimized during the processing. The chi-squared value for each peak, then, is simply a number that tells you how well the peak fits the residual data after background subtraction. You need to be cautious about over-interpreting these values. You can see in Figure 4 that larger peak areas are, in general, accompanied by larger values of chi-squared. That doesn't mean that larger peaks are somehow "worse" than smaller peaks. What you're really looking for is high chi-squared values for a given peak size. That is, you're looking for relatively high chi-squared values. Those peaks will have larger uncertainties when compared to similar sized peaks with lower chi-squared values and you should treat them with more caution.

Peak Detection Limits

The final quality assurance step that you'll want to take is to identify peaks that might be below detection. This step is not strictly necessary, but it will help make sure that you don't over-interpret noise as meaningful signal. At a high level, what you are trying to do is to determine whether your calculated peak area is large enough to suggest that your actual peak area (what reached the detector) is greater than 0. If you can convince yourself, despite all of the uncertainties, that the peak is "real", then you keep it. Otherwise, you'll want to flag it somehow.

I'm not going to give you a clear procedure for determining the significance of your peak, because significance testing is interpretive and open to debate. You can perform a t-test (H0: Ptrue = 0), you can take a Bayesian approach, you can perform Monte Carlo simulations, or you can use rules of thumb. The simplest method is to construct a confidence interval for each peak using your preferred number of standard deviations (Eq 3). If your confidence interval doesn't include zero (i.e., P – 2σP > 0), then you can treat your peak as "significant". Please remember that you are ultimately responsible for defending your criteria! We recommend that you keep it simple and keep it consistent.

Report abuse