LD Algorithms

Important Note: This page is under development. Any feedback is welcome!

Algorithms used in LongDistances

LongDistances is a program to calculate distance probability distributions given data obtained from a DEER pulse experiment (Review). A good source for additional information is the manual of Deeranalysis 2019, but be aware that there are significant differences with some of the algorithms described below (compare e.g. the phase correction).

Overview

There are three main aspects for analyzing real or simulated data using software and a mathematical model.

1. The actual algorithm. These are published or even historical and often have fancy names well-known to experts in the field (Tikhonov, etc.). I typically don't invent significant theoretical advancements but sometimes combine existing algorithms in new ways.
2. The implementation using a high-level programming language (LabVIEW, Fortran, C++, Matlab, Python, etc.). This is more an art that science and slight differences in implementation can give orders of magnitude differences in execution speed. I have extensive experience in code optimization and have won many coding challenges. Details such as memory allocations, inplaceness, vectorization, caching, etc. are employed to make the calculations as fast as possible. All inner algorithms are extensively benchmarked and hand-tuned for performance. An alternative would be to use existing canned toolkits (if available), but they typically offer less control. I use LabVIEW because graphical code is inherently parallel.
3. The user interface allowing the interaction with parameters and settings in real time while immediately seeing the effect on the result. This gives the user a much better grasp of the relevant issues compared to just staring at a printed formula. It is important to have a built-in data simulation system where any scenario can be generated and analyzed. This gives the user freedom to play and to intuitively learn the effect of each parameter as well as gauge the limitations of the analysis system if fundamentals of signal processing are violated (i.e. determining a frequency is difficult if only a fraction of a period is measured).

The detailed analysis of raw DEER data consists of several distinct steps:

- Data preparation (trimming of the raw data, background selection, zero point, phase, etc.)
- Deciding on a model (yes, "model-free" is a model too)
- Analysis to get the sought after result (the distance probability distribution)
- Estimation of the quality of the result (error analysis)

Data preparation

Raw data from the instrument typically needs some polishing before it can be used. This involves phase correction, defining true zero (t0) of the time axis, and possibly trimming of edge artifacts.

Phase correction

DEER data is typically complex (recorded in quadrature with real (RE) and imaginary parts(IM) ) and requires phase correction to rotate all relevant signal into the RE part. Assuming that the data is properly phase cycled (meaning that the true 0+0i is accurate, i.e. there is no arbitrary offset in the data) the correctly phased data will be all in the RE part while the IM part is close to zero (i.e. only noise+ potential artifacts). If the phase is incorrect, the direction of the complex vector sum can identify the required rotation in the complex plane to have all important data in the real part. For all practical purposes, the complex magnitude is identical to the real part and can be used directly except in cases where negative amplitudes are present after phase correction, e.g. as found with data that has the background already eliminated by external processing or was simulated without background (e.g. the Edwards&Stoll dataset). By default, all data is automatically phase corrected when loaded, but the magnitude of the complex data is used for further processing. If negative probabilities are present in the real part after phase correction, the real part is analyzed instead.

Determination of time zero

In the absence of instrumental artifacts, DEER data is symmetric around t0. Only if t0 of the kernel is aligned with t0 of the data, a Tikhonov regularization followed by back-calculation of the DE data will result in a near perfect fit. Any misalignment will cause a dramatic mismatch that can be used as a penalty to optimize time zero. Since the kernel calculation can be done in sub-milliseconds, nonlinear fitting with t0 as only adjustable parameter quickly converges to the most optimal value. The mismatch can be observed by manually moving the t0 cursor away from the optimal value. Note that in the presence of very short distances and long inter-point spacing, the center is only defined by very few points and automatic detection might fail or converge to a wrong value.

For the above to work well, some data should be available at negative times. If the data starts at the max or after some deadtime, t0 can be set manually. Newer version of the bruker software contain the t0 offset in the dsc file and this value can be used as a guideline (Also displayed as Xmin on the lower left of the background tab).

Background determination

If background is known to be present, there are five options to define the background used for processing. Each step is a refinement and the quality of the fit often increases with each step. Note that selection of the background model must be done manually and the refinements directly depend on the correct model selection.

Selecting "Background=none" allows analysis of the data without any assumptions of the background model. The presence of significant amounts of random spin pairs will cause a large population at longer distances and since these are typically above the limits (as defined by Jeschke (add ref)). The actual distribution shape at longer distances is random and meaningless. However, looking at the integrated intensity can directly give a qualitatively good estimate on the relative amounts of interesting pairs. Note that the distance range should be expanded to much longer distances (e.g. 300-500 Ansgtroms) on the "internal data" tab, else the back-calculation will be poor.
Manually moving the cyan cursor points to define the background. This is of course not recommended because it is not reproducible and introduces operator bias.
An instant background based on some reasonable assumptions. This is what we get whenever new data is loaded and before the background is further optimized. Typically, this estimate is already pretty good and can always be restored by clicking [Reset] on the background or fitting tab. Resetting is useful whenever the background cursors have been moved to very unreasonable locations and will bring sanity back to the displays. The details of the algorithm are undocumented and involve a highly nonlinear weighted low order polynomial fit to the entire raw data.
An optimization based on the plain Tikhonov regularized distance distribution assuming that the real distances are limited to some finite range and anything longer is due to pure background. This is carried out on the background tab and is often sufficient. The function to be minimized during optimization is the absence of peaks at distances above the threshold and/or a flat terminal integrated intensity over that same range. This method uses the plain Tikhonov regularization (i.e. negative probabilities are allowed), because this regularization is fast (non-iterative) and because the presence of negative probabilities provides a good penalty during optimization. Note that there will be remaining distortions in the case of heavily truncated DE traces and the user is encouraged to explore these effect using simulated inputs.
Co-fitting of the background parameters with model-based or model-free fits. This is carried out on the fitting tab after enabling "refine background" and will give the highest background fidelity. Because mindless model-free background co-fitting often prefers a result with less background and significant amounts of longer distances, additional constraints are implemented to find the solution with the smallest modulation depth. Without that bias, all fits with the background points below the data will generally give a solution with comparable residual and model-free fitting with background would not converge to a unique minimum (especially if deriv>0).

For severely truncated data, disentangling background from data is not always possible. The user can always define the background manually by moving the cursors, but the result might still not be reliable. Note that all methods assume that all DE data contains signal and background over the entire measured range, of course. There is no imaginary point where data ends and background begins.

Calculation of the transforms

Kernel generation

The relationship between distance probabilities and dipolar evolution data (DE) is given by the "kernel" matrix (kernel might not be exactly the right term because "kernel" has other specific meanings in linear algebra, convolution, operating systems, plant biology, etc.). This matrix allows us to do the discrete approximation of the integral transform as will be described below. Since this matrix needs to be recalculated often and typically contains tens to hundreds of thousands of matrix elements, efficient algorithms are needed.

The kernel (as defined above and always visible in the "internal data" tab) is a simple rectangular matrix containing one "atomic dipolar evolution" (i.e. distance width=0, but all possible orientations in space) for each distance. In the other dimension, the points are matched to the time points of the dipolar data to be analyzed. The number of distance points and the distance range can be chosen freely on the "internal data" tab while the time points are fully defined by the experimental (or simulated) data, possibly truncated by the ranges selected in the "Select" tab. Don't worry about situations where the system appears to be under-determined, e.g. if the number of distance points is higher than the number of DE points. The algorithm as implemented can transparently deal with kernels of any shape (i.e. aspect ratio). Similar to compressed sensing, we can take advantage of assumptions about the solution such as distance sparseness and correlations between adjacent distance points. In the current implementation, it is assumed that the time and distance points are linear ramps, but this requirement could be relaxed if needed.

The number of points in each dimension (Nr, Nt) of the matrix is always known and, as soon as anything changes (new file loaded with new dt, different distance range, different number of points, different time range, new zero point, etc.) the kernel is recalculated and cached. The matrix recalculation is highly optimized for speed, yet results in an extremely high quality kernel. Even a kernel with hundreds of thousands of elements can be generated in milliseconds. Note that for negative times, the absolute time value is used, creating symmetry around t0. The time increment (dt) is assumed to be exactly known from the machine settings, but t0 can be fractional. In the absence of excitation bandwidth correction, the atomic dipolar evolution shape (before scaling to the time axis) is identical for all distances and can be implemented as a lookup table (LUT). We currently use a template numerically generated using fresnel integrals and containing 2^14 elements. This is similar to a "mother wavelet" and all DE traces can be obtained by simple scaling (no shifting, of course). These settings give a resolution of ~60 lookup values to the first minimum and >160 oscillations. The LUT is static and calculated at compile time and any table lookup uses linear interpolation between LUT points.

Calculating a kernel that includes excitation bandwidth correction is slower, but because all kernels are cached there is no noticeable speed penalty during normal use. The excitation bandwidth corrected kernel integrates over 1000 orientations (user adjustable on the settings tab) which is normally more than sufficient and completes in well under a second for typical sizes.

The calculation of the transforms only involves three items:

- The dipolar data with defined t0, dt, Nt.
- The distance probability distribution with defined r0, dr, Nr
- The kernel matrix defined for the same axes (t0, dt, Nt, r0, dr, Nr)

Given (t0, dt, Nt, r0, dr, Nr), the kernel matrix can be calculated as described.

Given the distance probability and kernel, the dipolar data can be calculated (forward problem, well-posed, details below).

Given the dipolar data and kernel, the distance probability can be calculated (inverse problem, somewhat ill-posed, requires regularization and a choice of the regularization parameter (alpha* or smoothness, depending on algorithm).

Note that the regularization is typically not a significant problem when analyzing real data and it should not be the main focus. Even the choice of regularization parameter is not very critical because the results don't vary that much over about half an order of magnitude around the optimal choice. Most uncertainties arise due to incorrect background, insufficiently trimmed data artifacts (e.g. due to pulse overlap or other instrumental problems) or insufficient data in general (e.g. poor S/N or a time trace that is too short for the relevant distances).

Solving the Forward Problem

Simulation (transforming a distance distribution into a dipolar evolution)

Calculating the dipolar evolution (DE) from a distance distribution is a simple Matrix * Vector multiplication:

DE vector = Kernel matrix * distance probability vector

This can be done nearly instantly for the typical kernel sizes. Simulating a DE normally also includes a background (time points shared with the DE):

DE vector = (Kernel * distance probability vector) x background(b1, b2, b3)

Where "*" denotes a matrix-vector multiplication and "x" an element-by-element multiplication. (The definition of the parameters b1, b2, b3 depends on the selected background model)

Solving the Inverse Problem

Fitting (Regularization of the "dipolar evolution -to- distance distribution" transform)

While the forward problem (calculating the DE based on the kernel matrix and distance vector as described above) is well-posed and mathematically trivial, we need to solve the inverse problem: calculating the distance probability vector given the kernel matrix and DE. On paper, this is the same as solving a linear system of equations and is an ill-posed problem for inputs typically encountered in DEER data. The problem can easily be stabilized with regularization (And non-negativity constraints) or approximated by fitting to a model.

Several algorithms are available and computational complexity (and thus processing time) vary. Still, even the slowest should complete within seconds for reasonable kernel sizes and typical computer hardware.

There are no size restrictions in the code and any kernel size can be handled given sufficient time and memory (LongDistances is a 64bit application and is thus not limited to a 32bit address space). Try setting the number of distances to e.g. 1000 and things will slow down a bit for DE traces with many points, so be patient. Of course there is no good reason to use more than a few hundred points.

Four algorithms are currently available to solve the inverse problem and they are described in more detail below

1. Plain Tikhonov regularization (very fast, stable, can give some negative probabilities).
2. Non-negative Tikhonov regularization (pretty fast; no negative probabilities; some random jitter far from the optimal alpha*).
3. Model-free fitting (fast; similar results to non-negative Tikhonov but using a very different approach; expandable to include additional parameters and constraints, such as background co-fitting)
4. Model-based fitting (very fast; requires user input for number of peaks and their shapes; optionally allows co-fitting of the background).

(1) Tikhonov Regularization (plain, i.e. negative distance probabilities are allowed)

In the most simple implementation this is pure linear algebra and can be done in milliseconds. Only one parameter (alpha*) is used to adjust the result until a good balance between residual norm and solution norm is found (e.g. using the "L-curve"). There are no constraints on the sign of the distance probabilities and some negative probabilities are often found. While negative probabilities are of course nonsense, the balance between positive and negative probabilities as well as the typically perfect back-calculation (for sufficiently small alpha*) has useful applications. Currently, plain Tikhonov regularization is only used for the t0, S/N, and the third version of the background determination (see above).

Mathematically, there are two quite different algorithms for Tikhonov regularization. Both give identical results but the relative speed depends on the size and shape of the matrix. The crossover line has been determined experimentally and the faster version is always automatically selected based on the input sizes.

The two algorithms are:

1. Plain linear algebra
2. SVD Based (currently only for the 0th derivative operator version)

Several derivative operators are available and can be freely set from 0th to 9th but this setting is not too critical. After optimizing alpha* for a given derivative, the results are typically similar. The user is encouraged to play with these settings and observe the results.

(2) Non-negative Tikhonov Regularization

This is similar to plain Tikhonov regularization with the further restriction that all distance probabilities must be non-negative. No closed form formula exists and an iterative procedure is used to eliminate negative probabilities. The cost of calculation is slightly higher but it typically completes in a few iterations taking a few tens of milliseconds. For highly under- or over-regularized scenarios (i.e. chosen alpha* far from an optimal value) the results are less deterministic as can be seen by erratic behavior in the far ends of the L-curve. This can be ignored. It behaves well near the knee of the L-curve. (It is possible that details of the algorithm will change in future releases to get more stability in the fringes.)

(3) Model-free fitting

Model-free fitting tries to achieve the same as non-negative Tikhonov regularization, but using a very different approach. This allows inclusion of other parameters (such as background terms (implemented) or even the regularization parameter (not yet exposed to the user)) as well as inclusion of additional penalties and constraints. In the most simple form, the background is held fixed and we fit to a function that consists of the dipolar evolution data appended with an array of zeros, matched to the size of the number of distance points and the chosen derivative. The parameters used are the logarithm of all the distance probabilities (we use a logarithm to guarantee a positive probability as well as for other beneficial reasons. Dozens of other mappings have been tried and compared). The function is the simulated DE trace based on the distance probabilities concatenated with the derivative of the parameter array. Fitting that derivative of the current distance distribution to the array of zeroes favors smooth solutions and is basically equivalent to Tikhonov regularization. (Note that the derivative can again be freely chosen from 0 to 9). The smoothness parameter simply adjust the relative weight between the two parts of the model function. For large values, a smooth solution is favored over a good fit to the DE part and vice versa. It might seem daunting to do a nonlinear Levenberg-Marquardt fit to a model with 200+ somewhat correlated parameters (i.e. distances probabilities here) but most partial derivatives reduce to a simple analytical form that is very cheap to calculate. The fits typically converge within seconds and are often subjectively "better" than non-negative Tikhonov regularization. If the background refinement is enabled, three additional parameters (b1, b2, and b3) define the background and are co-fitted to refine the solution. If the background is co-fitted, modulation depth is penalized to find the solution with the smallest modulation depth (Needed for deriv > 0). (Note that the penalty strength is currently not exposed to the user and is preliminary. It might need a few more tweaks.)

As mentioned, a well-tuned Levenberg-Marquardt implementation is used. It is written from scratch and outperforms the stock algorithms. It also implements some extra tuning parameters and algorithmic details, some are exposed in the fitting options. Note that extreme choices for the tuning parameters even allow steepest descent or conjugate gradient fitting (not recommended).

(4) Model-based Fitting

Model based fitting assumes that the distance distribution is a weighted sum of simple line shapes, each characterized by a shape, relative area, position, and width. Here the function is a simulation of the DE data given the kernel, the composed distance distribution, and optionally the background parameters (b1, b2, and b3). Fitting parameters define the lineshapes and background. Care should be taken to pick a reasonable amount of peaks and reasonable starting values, typically by looking at the distance distribution obtained from Tikhonov regularization, which is always available. For example, fitting a single peak to 6 Gaussian lines would not be reasonable. In future versions, there will be an option to automate reasonable guesses. A width=0 is valid and will proportionally fill two adjacent distance bins according the the fractional position between the bins.

Note that any lineshape parameter can be held fixed at a defined value by deselecting the green LED next to it. For example, the width can be quite poorly defined by the data and will give large parameter errors when allowed to vary. Fixing certain widths based on reasonable assumptions can stabilize the fit dramatically.

Footnote:

*The actual magnitude of alpha/smoothness depends on the internal scaling of data and matrices and cannot be compared in absolute terms with values from other software implementations (e.g. Deeranalysis). Always use the L-curve before comparing!

Page updated

Google Sites

Report abuse