Intro to Origin Lab

How to Re-Bin Data

Instructions

Assuming you got your data in an x and a y-column, select both columns and then Statistics / Descriptive Statistics / 2D Frequency Count/Binning / Open Dialog

  1. To select the number or rows to be binned together use in the x-column setting Step Size and either select Bin Size or Number of Bins.

  • Selecting Step Size /_Bin Size will bin as many rows required to produce a final bin width of Bin Size. For example, if x is incremented in steps of .2, a Bin Size of 1 results in the binning of 5 rows.

  • Selecting Step Size / Number of Bins is a bit misleading and should really be renamed "Total Number of Bins." Hence, this setting does not represent the number of rows that will be averaged. Instead, using this setting will determine how many rows will re-binned based on the values of the max and min x-values and this setting. For example, for x-data incremented at .2 going from -10 to +10, i.e., 40 rows, setting Number of Bins to 10 will sum 4 of them resulting in 10 final bins.

The Step Size / Bin Size method is probably more useful.

  1. Finally you need to set up y-values in the bins to be either summed or averaged. In the y-column setting, select the Step Size / Number of Bins always to 1. (You may have to turn off "Auto" first.)

Most important, set the Quantity to Compute to either Sum or Average to sum or average the bins.

Example

Figure 1: Sample plot of Gaussian noise

The plot above shows a noisy Gaussian data set. It is difficult to fit in Origin because of its 0 values, especially if y-error bars are used which are proportional to square root of y.

Here is a small part of the data set:

We now re-bin the data with a new bin size of 0.5. The bins will be summed. After selecting Statistics / Descriptive Statistics / 2D Frequency Count/Binning / Open Dialog the following dialog box opens.

Figure 2: Setting the bin settings

We set the Step Size /_Bin Size to 0.5 and the y-column settings as explained in the instructions above. The resulting data set looks like this:

and a graph of is shown below:

Figure 3: Plot of the re-binned data

  • The Origin Project File for the data is testbinning1a.opj, found in the file list below

Re-Binning Data Example: Poisson Statistics with Low Count Values

Introduction

Shown below is a data set of 1000 points whose noise component obeys a Poisson distribution. The data was generated in LabView, and for simplicity, it has a linear dependence on x. As illustration, this data set could, for example, represent the number of radioactive decay counts that were observed as a function of some variable x, where x might represent the amount of absorber removed blocking the radioactive source.

The data set consists of a 1000 data pairs of count value vs. x, with the range of x going from 0 to 10 at increments of 0.01. Since the noise in the data corresponds to a Poisson distribution, it follows that that the standard deviation of a particular count event must obey:

Figure 4: The raw data of counts vs. x. A simple fit using the Origin's: Analysis / Fittiing / Linear Fit was performed and produced the results shown in Figure 4. The results, especially their standard errors, are somewhat meaningless since no error was specified. (In such a case Origin automatically applies a unit error of "1" to each y-value.)

However, fixing the fitting problem by adding uncertainties to the count values based on equation 1 will not work with the data set as it exists. The reason for this is simple and it is caused by the zero count events. Based on equation 1, such zero count events create a standard deviation of 0, and in essence force the fitted line to pass through all them, resulting in meaningless fit results. See Figure 5 for the results of such a fit where equation 1 was blindly applied to the data set to produce an error in y-column which then was fitted.

Figure 5: Another fit of the raw data of counts vs. x with the errors calculated according to equation 1. The fit was unable to complete due to the zero count events and their corresponding 0 error.

At this point it is tempting to proceed and simply mask all zero count events away. However, this is wrong; doing so implies that there were no zero count events which is clearly not true! The correct way to proceed is to re-bin the data as shown in the next section.

Re-Binning the Data

The problem caused by the zero-counts disappears if we combine the count data over longer intervals, i.e., if we re-bin it. Currently, the data as shown in the previous Figures was collected at 0.01 x-axis unit intervals and it contains a large number of no-count events. However, if we accrue the count events over a longer interval, i.e., one long enough so that we no longer end up with a zero count event, then we can avoid the problem with the zero standard deviation events. Applying an x-axis interval of 0.25 units (instead of 0.01) solves this problem because it sums the counts over 25 consecutive intervals.

In Origin, we select: Statistics / Descriptive Statistics / 2D Frequency Count/Binning and use the settings shown in Figure 6.

Figure 6: Settings for the re-binning of the data.

There are two selections that were chosen in Figure 6 that are worth further explanations:

  1. We selected the “Bin Center” instead of “Bin End” from the x-axis “Specify Binning Range by” because it represents the center of our data interval or bin; selecting “Bin End” will shift the binned values to the rightmost bin interval.

  2. Note that we have selected the "Mean" as the "Quantity to Compute” (instead of the "Sum.") This choice avoids having to adjust our fitted data for the size of the binning interval selected since (for linear data) the "mean" is independent of the summing interval.

The re-binned and fitted data is shown below in Figure 4.

Figure 7: Data (re-binned) averaged over 25 events and fitted with its standard deviation.

Calculating the Uncertainty on Averaged Data

The final task before we can fit the binned data is to determine the uncertainty,

, for each averaged data point, . It is defined as:

where N represents the number of points averaged. (In our example this corresponds to 25.) Applying equation 1 to an averaged data point,

, we find:

Combining these equations yields:

We add an extra column to our binned data and apply equation 4. Finally, in Origin we set this column to the y-error and then fit it using a linear fit with its default "instrumental weight.". This produces the results shown in Figure 7.

Comments

  • When fitting the data, the first and last binned data point was masked. This was done to avoid dealing with binning intervals at the beginning and end of the data set that may not contain 25 points.

  • There is a subtle question here how the width of the binning interval may (or may not) affect our final results, specifically, the errors in the result. For example, one could select the binning intervals so wide that we end up with only 4 bins. It seems, though I haven't a proof, that keeping the final number of points along the x-axis as close to the number of points along the y-auxs (if they are discreet) will probably produce the most accurate result.

Peak Fitting and Issues with Standard Errors

Introduction

Origin has a very powerful peak fitting package. After selecting the particular peak fitting function, Origin will determine the parameters of the fit such as the center, amplitude, offset and area of the fit as well as their uncertainties, or as Origin calls it, the “standard error(s).” However, be warned: when you fit data with uncertainties, by default, the uncertainties for the fit parameters may be incorrect!

This serious issue (bug?) can be (mostly) fixed by disabling the Reduced Chi Square in the Advanced / Fit Control selection. To illustrate this issue, an example of a simple Gaussian fit with some noise is given below.

Using GaussAmp Peak Fit Function

For this fit, a noisy Gaussian data set with 202 points {<latex>x_i, y_i, \sigma_{y_i}</latex>} was first created. It was generated by using Origin’s GaussAmp function (shown in Column B in Figure 1.) and then Gaussian Noise with a “myNoiseAmp” of 10 was added. (The noise is in Column C and sum of Column B and C, i.e., the Gaussian with the Noise is in Column D.) A y-axis error column, shown as Column E in Figure 1 above, was added and it was set to a constant value of 10, i.e., similar to the value of “myNoiseAmp.”

Figure 8: The raw data with x-values (A), Gaussian (B), Noise (C), Gaussian + Noise (D), and constant error in y-values (E).

The Origin function used and its variables were:

nlf_GaussAmp(Col(A),y0,xc,w,A) + myNoiseAmp*normal(npts,seed)

double y0 = 10;

double xc = 100;

double w =25;

double A = 100;

int npts = 202;

int seed = 125;

range r2 = Sheet2!B;

double myNoiseAmp = r2[1];

Figure 9: The original data set with a y-error of 10 units.

After selecting both columns D and E, i.e., the y-values and its error, the “Analysis / Peaks and Base Line / Single Peak Fit” was selected from Origin’s menu. The panel shown below opens and for his example we select the “GaussAmp” function as our fitting function.

Figure 10: Fitting function panel.

The fit results, shown in Figure 9, agree well with the values that were used to create the data set in the first case. The reduced chi squared is close to 1 indicating a “good fit.” Therefore, it’s only reasonable to assume that the quoted standard errors in Figure 9 are correct. In this particular case (because of the reduced chi squared being close to 1) they are close; however as we show in the next example, by default, they are not affected by the errors in y and, hence, they can be misleading!

Same Fit, Different Y Error

We repeat our fitting procedure and make one change to our original data set and reduce to the error in the y-values from 10 to 1 (unit.) Clearly, such a change should have a major effect on the standard errors. The result of this new fit is shown below.

Figure 11: The original data set and the fit results with a y-error reduced from 10 to 1 unit(s) with incorrect standard errors displayed.

From the fit results in Figure 11, we notice two things:

  • First, since we decreased the y-error by a factor of 10, the reduced chi^2 has increased by a factor of 100. This is an indication that we underestimated our error and this should warn us to be careful with the calculated standard errors.

  • Second, since we underestimated our y-error, consequently, the standard errors for the fitting parameters should also be underestimated and should have decreased. However, when comparing the ones calculated for this data set and with the ones obtained earlier, in Figure 9, we see that the standard errors for y0, xc, w and A, have not changed at all! This is a serious problem because it indicates that the standard errors of our fitting parameters are pretty much immune to the y-error!

Disabling Reduced Chi Square

Luckily, we can remedy this “issue” by deselecting “Use Reduced Chi Squared” (which by default is enabled) in the “Advanced / Fit Control” as shown in Figure 5.

Figure 12: Fixing the problem with the standard errors being incorrectly displayed.

The results of the fit with the “Fit Control / Use Reduced Chi Squared” disabled is shown below.

Figure 13: The original data set and the fit results with a y-error reduced from 10 to 1 unit(s) and the correct Standard Errors.

The results in Figure 13 show the values for the standard errors for y0, xc, w and A have been reduced by a factor of 10, which is what we expect.

Warning: Bug in Calculating Standard Errors

From the fit parameters, Origin conveniently calculates derived parameters such as in this case, the area under the peak (minus any offset) and the FWHM. It uses the values obtained for the fit parameters and equations listed in the Function File. (See Figure 14.)

Figure 14: Equations used to calculate the FWHM and Area of the “GaussAmp” function.

While the calculated values for these parameters are correct, their standard error is not! For example, comparing the standard errors in Figure 4 and 6 for the Area and FWHM reveals shows that they have not changed. This is a bug in Origin (ORG-7585) that will be updated in a version later than 8.6! So make sure you check your answers!

As a work-around, you can calculate the standard errors by using error propagation. For example, in the case of the GaussAmp function, the Area is given by:

Since the values for <latex>\sigma_w</latex> and <latex>\sigma_A</latex> are (correctly) given by the standard errors in w and A, you can calculate yourself the standard error for the Area, <latex>\sigma_{Area}</latex>, by propagation of error. It should be:

Comparing Original Data Set

Finally, let us revisit once more our original data set, shown in Figure 8 and 9 and check what happens to the standard errors when we disable the Reduced Chi Squared in the advanced tab. In other words, was our assumption correct that the standard errors for y0, xc, w and A were close to the “correct” values since the reduced chi squared was close to 1?

The final fit results with the Reduced Chi Squared in the advanced tab disabled are shown in Figure 15. Comparing these with the ones in Figure 8 shows that indeed they agree when our reduced Chi Squared is close to 1.

Figure 15: Original data set with the Reduced Chi Square in the advanced tab disabled.

Conclusion

  1. When fitting data with uncertainties to a peak fit, for the standard errors to be meaningful you must disable in the “Fit Control / Use Reduced Chi Squared.”

  2. If you followed step one, while the standard errors on the fitted parameters might be correct, the standard errors on values calculated from these may not be! (This is a current Origin bug, hopefully soon to be fixed.)

  3. While Origin has a multiple peak fitting routine which finds the parameters for multiple peaks, this option will not give you the ability to disable the red. Chi^2. Therefore, either make sure you work with a reasonable red. Chi^2 or use the single peak fitting function and fit one peak at a time while masking all other peaks.

-- Main.KurtWick - 29 May 2014