KDE Analysis

The Kernel Density Estimation analysis (KDE Analysis) tab gives access to one option and its associated parameter.

Bootstrap Error Estimate: When checked, extra computation (described below) is performed each time a KDE curve is requested.
# Bootstrap Replicas: Number of replicas used for the computation described below.

When a KDE curve is computed, an attempt is made to locate peaks in the resulting curve. The result is logged in the Notebook, but these locations do not come with any kind of error estimate.

To return an error estimate on these locations, bootstrap replicas need to be used, from which multiple KDE curves can be computed. These additional calculations are only performed when the Bootstrap Error Estimate checkbox is checked. For each of the N replicated KDE curves (where N is the number defined in the # Bootstrap Replicas), peaks can (in principle) be determined, resulting in N sets of peak locations.

Suppose P_i is the number of peaks found in the KDE curve of replica i.

The algorithm finds which number P_i is the most common (assuming there is one number that shows up predominantly. If that is not the case, no error estimate is returned).

Let's call P this number of peaks. Taking all replicas for which the number of peaks is P, the mean location and standard deviation of each peak is computed based on this subset of replicas. The standard deviation is displayed in the Notebook, together with the number of replicas retained for the analysis (in brackets).

Important warning: Each KDE calculation involves the computation of M values of a Gaussian function per replica, where M is the number of KDE curve points requested by the user. Therefore NxM such computations are carried out (including randomization, function estimation, mean and variance calculation). Choosing N large may result in significant computation time!

The number M of KDE points mentioned above is identical to the number set in the Histogram Fits Tab of the ALiX Settings Window defined by the two following parameters:

Use Same Number of Points as Histogram
# Points

If the first parameter above is unchecked, then the number defined in # Points is used. Otherwise, the exact same locations as for the histogram are used to compute the KDE curve.

Which histogram? In principle, no histogram is needed to compute a KDE curve. However, in practice, the KDE curve is usually compared to a standard histogram of the data. In this case, it is important to know which histogram the KDE curve is compared to, in order to ensure proper scaling.

In the absence of an actual histogram calculation, the option of using a predefined number of points can be chosen: the min and max value of the data array will be used as the min and max value of the curve and a total of M equidistant points will be used in between these bounds. In this case, the KDE will be normalized to have an integral equal to the number of data points.

Page updated

Report abuse