Modelling three-dimensional protein structure is crucial to understanding how proteins function, but the thousands of possibilities make predicting how a given protein chain will fold in 3D space difficult. In the early 1900s, at the intersection of physics, biology, and mathematics, X-Ray Crystallography emerged as a method to model the atomic and molecular structure of crystallized proteins using the diffraction of x-ray light. Many major biological discoveries rely on x-ray crystallography, including the discovery of DNA’s structure, the double helix. The study of crystallography also utilizes the mathematical tool of Fourier Transforms, which decompose a periodic function into a sum of simple harmonic functions depending on frequency. The goal of this project is to understand how fourier transforms decode x-ray crystallography data and what the features of the fourier transform convey about the crystalline protein.
Before performing the transforms, we need to understand fourier series and coefficients. The former simply describes the summation of simple harmonic functions of time that make up any periodic function, from sound waves to electromagnetic waves like x-ray light. The general function f(t) with period T can be broken down as follows:
f(t) = ½a0 + a1cos(⍵t) + b1sin(⍵t) + a2cos(2⍵t) + b2sin(2⍵t) + …
where ⍵ is the fundamental frequency (2π/T) and there are n number of sin and cosine terms. (In a fourier transform, the number of terms is infinite.) While getting f(t) from the fourier series if all the coefficients are given is easy, the reverse direction (called fourier analysis) requires a little more math. Joseph Fourier invented a straightforward proof (Fig. 1) to perform this analysis, although computational programs like MATLAB can automatically compute the fourier coefficients of a function using the Fast Fourier Transform algorithm, which reduces computation time and complexity. Fourier transforms can be represented by frequency-domain graphs, which plot amplitude (the value of the fourier coefficient) with respect to the frequency of that harmonic term.
Let’s look at the periodic square function with a period of 2π: f(t) = {1 for 0 < x < π/2, -1 for π/2 < x < π}. Fig. 2 shows the calculations for the fourier coefficients a0, an and bn, using formulas V. in Fig. 1. Because of the discontinuities in f(t) at π and 2π, the integrals were taken over two separate intervals. As a0 = 0 and an = 0, there is no constant term or cosine terms in the series. For even values of n there are no sine terms, and for odd values of n, the coefficients equal 4/nπ.
Figure 1: Fourier's formulas; m, n = nonzero int, ⍵ =2π/T. (source 1)
Figure 2: Fourier coefficient calculations for square function
The interactive figure below graphs the first three terms of the fourier series in blue, their sum in red-- an approximation of f(t)-- and the summation of the first 100 terms in black-- a much better approximation. As the value of p, the number of terms in the fourier series, increases so does its accuracy in approximating the square wave. As per the calculations in Fig. 2, the terms all follow the form (4/nπ)sin(nx), for odd values of n. The red function clearly shows how summing multiple harmonics (the blue curves) can create a more complicated periodic graph. The frequency-domain graph for this fourier transform would have odd n values on the x-axis, with amplitudes of 4/nπ-- an alternating, decreasing series.
One last note on general fourier transforms: for x-ray crystallography, one useful way to write fourier terms is the waveform cos(2πx) + isin(2πx), which is also a complex number. This basic wave has the exponential form e2πix, which makes the integrals for the 3D fourier transforms used in crystallography much more efficient.
Crystals are the periodic arrangement of atoms, composed of unit cells that contain an identical collection of atoms. Crystal samples are used in the place of singular molecules or proteins because a typical crystal aligns around 1015 molecules, amplifying the light scattering that results from each unit cell. The x-ray beams are used to determine the distribution of electrons in each molecule, called the electron density, the shape of which allows crystallographers to build protein models. The electrons within each unit cell are responsible for x-ray scattering; the incident light beams collide with them and diffract, the electron’s position determining how the scattered beam hits the detector.
Figure 3: Parallel Bragg planes. (source 2)
According to Bragg’s law, the crystal unit cell is divided into parallel plans of electrons, each spaced distance d apart. For maximum intensity of the scattered light, the identical x-rays reflecting off adjacent planes must interfere constructively, meaning the waves are in phase and the sum of the waves has the maximum possible amplitude. As shown in Fig. 3 to the left, the path difference of 2BC between the rays reflecting off adjacent planes equals 2dsin𝛳, where 𝛳 is half of the ray's total angle of deflection. Bragg’s condition states that adjacent reflections interfere constructively if the path difference is an integer multiple of the wavelength λ: ie, nλ = 2dsin𝛳.
Bragg’s method imagines each possible set of parallel planes in the crystal as an independent diffractor, producing a single reflection from constructive interference of scattered rays. This summed reflection itself can be modelled as a fourier series, where the reflection of each individual atom in the identical planes is one fourier term. For a given wavelength and angle of incidence for the x-ray beam, Bragg’s law can determine the arrangement of these planes and where, geometrically, to look for the data.
During data collection, a crystal sample is mounted between the source of the x-ray beam and a detector. The crystal is rotated on its axis such that the incident x-ray interacts with every possible set of parallel planes in the crystal, directing each unique diffraction to the detector. The detectors use scintillation counters to measure the reflection intensities by counting the number of x-ray photons that pass through the detector at each contact point. The detectors output a diffraction image like the one shown in Fig. 4. The result of X-ray diffraction data collection is a list of intensities, each describing the relative strength of the reflection from a certain set of planes. Additionally, the spacings between reflections on the detector are inversely proportional to the dimensions of the unit cell. So the locations of the reflections in the image describe the crystal’s parameters, while their intensities actually describe the molecular structure. Intensity is related to the amplitude of light waves.
Figure 4: Diffraction image of a protein.
(source 3)
To go from x-ray diffraction data to electron density, crystallographers use the fourier transform. The fourier sum description of every reflection on the film, where each term is the diffractive contribution of one atom in the unit cell, is called the structure-factor equation. So the structure factor Fhkl is the sum of many wave equations. The electron density equation 𝜌(x, y, z) can be written as a fourier sum of those structure factors, and is calculated with a fourier transform of the reflections. 𝜌(x, y, z) is a periodic function, as it repeats in every unit cell. The indices hkl denote the frequencies in the three dimensions, and Fhkl is the amplitude. Using the exponential form of the basic wave cos(2πx) + isin(2πx), the fourier sum for electron density becomes:
f(x, y, z) = ΣΣΣ Fhkle2𝜋i(hx+ky+lz)
This process works because fourier transforms are reciprocal functions, meaning the FT of an FT returns the original function. In fact, the scattering of the x-ray beam by atoms in the crystal is a physical manifestation of fourier analysis: decomposing one periodic function into many structure factor waves. Taking a transform of those scattered reflections brings us back to electron density, a proxy for atomic structure in the unit cell.
A program was used to simulate x-ray diffraction patterns from simple, periodic molecular models and to generate contour plots of electron density, using forward and reverse fourier transforms. In the first experiment, fourier transforms were run on individual reflections from a simulated diffraction pattern; these FTs were then added together (Fig. 5a-d). For the second experiment, the fourier transform of a simple red duck image was computed (Fig. 7a-b). Then either the outer or inner data was removed (Fig. 7a and 8a respectively), and a fourier transform was taken in the reverse direction, generating two altered versions of the original duck (Fig. 7b and 8b).
Figure 5a: Full diffraction pattern of a periodic molecular model.
Figure 5b: Contour plot of the FT of a single reflection (dot) in 5a.
Figure 5c: Contour plot of the sum of the FTs of two reflections in 5a.
Figure 5d: Contour plot of the sum of the FTs of multiple reflections.
Figure 6: The image of a red duck (a), and a contour map of its fourier transform (b).
Figure 7: The fourier transform of the duck image, with the outer data removed (a), and the fourier transform of 7a, which results in a low-resolution version of the original duck image (b).
Figure 8: The fourier transform of the duck image, with the inner data removed (a), and the fourier transform of 6a, which results in an only high-resolution version of the original duck image (b).
The results of the first experiment prove that the electron density is a fourier sum of the individual structure factor equations in the diffraction pattern. Each point on Fig. 5a is caused by one reflection hitting the detector, where the saturation represents amplitude and hue represents phase of the light wave. These points, called structure factors, are actually the sum of constructively interfering reflections from a set of parallel planes in the crystal. Thus, the contour plot of the FT of just one structure factor (Fig. 5b) shows a sinusoidal electron density function, with red representing maxima on each plane and blue representing the minima between planes. This will be true for the fourier transform of any single point on Fig 5a. However, when the FTs of two reflections (dots in Fig. 5a) are added together, there is interference between the two waves. In the contour plot in Fig. 5c, the red shows where maxima overlap and the blue shows where minima overlap. So, the squares of positive electron density indicate the approximate location of the molecule in the unit cell. As the FTs of more and more structure factors are added together, each contributes the average electron density from a certain set of parallel planes, and interference between all the reflections allows us to more accurately locate the molecules. The contour map in Fig. 5d shows a much sharper electron density of the atoms in the molecules, than Fig. 5b and 5c do. The full data sets for proteins and other biological molecules include thousands of structure factors. This experiment shows how fourier transforms are used to decode individual points on a diffraction image and then sum them together as fourier terms in the overall electron density.
The results of the second experiment demonstrate what information is carried in different parts of the diffraction pattern, simulated by the fourier transform of an image of a duck. Fig. 6a and 6b show the original duck and the contour plot of its fourier transform. As fourier transforms are reciprocal functions, performing an FT on the contour plot in Fig. 6b would reproduce the original image, Fig. 6a. Removing data far from the center of the contour plot (Fig. 7a) and applying an FT results in a low-resolution duck (Fig. 7b). On the other hand, removing data near the center of the plot (Fig. 8a) and applying the FT produces an only-high resolution duck, with positive electron density at the edges and negative density in the middle of the duck (Fig. 8b). These results can be explained through ‘reciprocal space,’ a term used to describe the diffraction geometry frame of reference compared to ‘real space’ of the crystal. The distances between reflections in the diffraction pattern is directly proportional to spacing in the reciprocal lattice, meaning they are inversely proportional to the unit-cell dimensions in the physical crystal. (After all, reciprocal space is a reciprocal of real space.) This means that as the distance between reflections and the center of the diffraction image increases, the dhkl spacing between planes in real space correspondingly decreases, and vice versa. Thus, reflections near the center come from widely spaced planes, which carry information about large features of the molecule. Reflections farther from the center come from closely spaced planes, carrying information about fine details of the structure. This makes sense in the context of fourier transforms: in the fourier sum that describes the electron density, reflections near the center contribute the low-frequency terms. Reflections far from the center correspond with the fourier series’ high frequency-terms, which have more red-blue oscillations and add more detail to the sum.
So, back to the ducks. In Fig. 7a, data far from the center was removed, so high-frequency reflections from closely-spaced crystal planes are missing: thus, Fig. 7b is low-resolution, with the general shape of the duck but no fine details. Fig. 8a is missing data close to the center, removing low-frequency reflections that carry large features. So in Fig. 8b, the details around the edges of the duck are present but there is negative electron density at the center, which would be filled in by low-frequency terms. This experiment relates characteristics of the diffraction pattern with the molecule’s electron density through fourier transforms between real and reciprocal space.