Speakers and Abstracts

Below is a list of the confirmed speakers and the abstract for their talks

Duane Loh (Ne Te)

National University of Singapore

Motifs and coarse-graining: how an obsession with resolution can make us lose sight of the big picture in complex systems

Many complex systems can be abstracted as collections of interacting motifs. At the nanometer scale, these motifs are made of persistent, local arrangements of atoms or molecules. The types of motifs, how they organize, and they respond to external stimuli are intimately related to the properties of the materials they compose. Oversimplifying things, whether these motif-motif organizations in a material are orderly and disordered is determined by whether these motifs tend to “collaborate“ or not.

Motifs coarse-grain away variations that are less important to their interactions. This coarse-graining has a seemingly conflicted relationship with imaging resolution. While only with sufficient resolution can we decide if certain structural variations can be coarse-grained away. But once coarse-grained, these often hard-earned high-resolution details are ignored.

In this talk, I will attempt to persuade you how coarse-graining can be essential in understanding emergent phenomena. Then I will show a diverse range of examples where coarse-graining helps us understand complex systems. First, I will describe how unsupervised machine learning can help us learn atomic motifs and motif-motif hierarchies (https://doi.org/10.1126/sciadv.abk1005) seen through STEM (scanning transmission electron microscopy), which lead us to discover high-entropy grain boundaries in a piezoelectric (arXiv:2305.18325). Second, I will describe how to efficiently estimate the high-dimensional posterior distribution of the structural variations between millions of Au nanocrystals imaged using single particle imaging XFEL (X-ray Free-Electron Lasers). Third, I will explain how coarse-graining away the local amorphousness of an extended structure can help us “pop-out” its three-dimensional structure directly from a single two-dimensional TEM (transmission electron microscopy) image (arXiv:2209.07930).

Alexandra Tolstikova

DESY, Germany

Real-time data processing for serial crystallography

Serial crystallography experiments involve collecting large amounts of diffraction patterns from individual crystals, resulting in terabytes or even petabytes of data. However, storing all this data has already become unsustainable, and as facilities move to new detectors and faster acquisition rates, data rates continue to increase. One potential solution is to process data on-the-fly without writing it to disk. Recently, we have implemented a system for real-time data processing during serial crystallography experiments at the P11 beamline at PETRA III. Our pipeline, which uses CrystFEL software and the ASAP::O data framework, can process frames from a 16-megapixel Dectris EIGER2 X detector at its maximum full-frame readout speed of 133 frames per second. With a careful choice of parameters, only 32 CPU cores are required to keep up with the data, even when the hit rate reaches as high as 40%. This presentation will provide a detailed description of our pipeline and discuss the impact of real-time processing on the way we preform serial crystallography experiments.

Valerio Mariani

SLAC National Accelerator Laboratory, USA

OM / Cheetah: blurring the line between online and offline processing for x-ray imaging experiments

The rise in remote data analysis, and the upcoming commissioning of high repetition rate facilities is bringing changes to how Serial Crystallography data is processed at facilities like LCLS. On one side, real-time data analysis is gaining importance, due to the ability to provide very fast feedback which can be used to steer the experiments in the immediate time scale. On the other hand, file-based data processing, which used to take place long after data was collected, has been getting faster and faster and is now expected to be concluded within minutes of the data appearing on disk, rather than hours. This is exemplified by the recent merge of the code bases of the OM and Cheetah data processing packages. This presentation will use the latest developments in the OM / Cheetah ecosystem as a starting point to discuss some of these changes, and hopefully will lead to an exchange of ideas with the data processing community on how to tackle these new challenges, with a special mention of the possible role of Machine Learning

Rainier Mobley

Arizona State University, USA

Monte Carlo Expand-Maximize-Compress algorithms

Merging x-ray diffraction data into a 3D diffraction volume is a crucial step for structure determination, but this process can become significantly complicated in XFEL experiments by unmeasured parameters varying between shots, such are target orientation, beam fluence, etc. Here we present a statistical optimization framework that can construct the diffraction volume even with many hidden shot-wise variables in the data, provided a sufficiently accurate forward model.

Andrew Martin

RMIT University, Australia

The pair-angle distribution function

Fluctuation scattering techniques study the structure of materials by statistically analysing large ensembles of diffraction patterns that vary due to orientational and/or structural disorder in the sample. The pair-angle distribution function (PADF) is a three- and four-atom correlation function that can be extracted from fluctuation scattering data. The PADF is a natural generalisation of the pair-distribution function that is measured with established techniques such as small-angle x-ray scattering or powder diffraction. In this talk we introduce the PADF and describe a python-based software package to compute PADFs from experimental data. We discuss the prospects for structural analysis of disordered and amorphous materials with focused x-ray and electron beams.

Richard Kingston and Michael Barnett

University of Auckland, New Zealand

High-solvent phasing

https://journals.iucr.org/m/issues/2022/05/00/jt5063/

A procedure is described for direct phase determination in protein crystallography, applicable to crystals with high solvent content. The procedure requires only the diffraction data and an estimate of the solvent content as input. Direct phase determination is treated as a constraint satisfaction problem, in which an image is sought that is consistent with both the diffraction data and generic constraints on the density distribution in the crystal. The problem is solved using an iterative projection algorithm, the Difference Map algorithm, which has good global convergence properties, and can locate the correct solution without any initial phase information. Computational efficiency is improved by breaking the problem down into two stages; initial approximation of the molecular envelope at low resolution, followed by subsequent phase determination using all of the data. The molecular envelope is continually updated during the phase determination step. At both stages, the algorithm is initiated with many different and random phase sets, which are evolved subject to the constraints. A clustering procedure is used to identify consistent results across multiple runs, which are then averaged to generate consensus envelopes or phase sets. The emergence of highly consistent phase sets is diagnostic of success. The effectiveness of the procedure is demonstrated by application to 42 known structures of solvent fraction 0.60–0.85. The procedure works robustly at intermediate resolutions (1.9–3.5 Å) but is strongly dependent on crystal solvent content, only working routinely with solvent fractions greater than 0.70.

Jeff Donatelli

Lawrence Berkeley National Laboratory, USA

Fast Direct Nonuniform Fourier Inversion and Its Applications to Generalized Phase Problems on Non-Cartesian Grids

A common need in many reconstruction algorithms is the ability to accurately and efficiently invert nonuniform Fourier information. While standard interpolation techniques can be used to quickly map data to uniform grids for subsequent analysis, they can lead to large errors, which are further exacerbated by the presence of noise in the data. Another approach involves coupling iterative linear solvers with nonuniform fast Fourier transforms to solve the associated linear system, but this typically requires a large number of iterations to converge, which can be prohibitively expensive if nonuniform inversion needs to be applied several times in a larger iterative algorithm, such as in iterative phasing.

Here we introduce a new approach that is able to directly invert nonuniform Fourier data with a single application of a nonuniform fast Fourier transform, providing a massive acceleration over iterative-based inversion methods while maintaining an arbitrarily-high accuracy, thus avoiding interpolation errors. Furthermore, we will show how this new nonuniform Fourier inversion approach can be used to generalize iterative phasing algorithms to accurately and efficiently invert nonuniformly sampled scattering data.

Ian Robinson

Brookhaven National Laboratory, USA and University College London, UK

Machine Learning for the Bragg Coherent Diffraction Imaging Phase Problem

Ian Robinson (1,2), Shinjae Yoo (1) and Longlong Wu (1)

(1) Brookhaven National Laboratory
(2) University College London

David Sayre proposed a solution to the crystallographic “phase problem” immediately after the announcement of Shannon’ Information Theorem: if diffraction can be sampled more than twice as finely as the Bragg peak spacing, the problem is overdetermined and can be solved [1]. Sayre did not explicitly mention the need for X-ray coherence, which has been happily solved with the development of the latest synchrotron sources. X-ray coherence produces speckle in the diffraction patterns which can be oversampled to overdetermine the phase problem. Sayre also did not specifically propose a closed form solution of the phase problem either, however many methods have been proposed to invert the diffraction to real space images over the 69 years since, all of them iterative algorithms that converge on the solution. But despite “proofs” to the contrary [2], when applied to real data with noise, these methods are found to be prone to local minima giving multiple solutions. In this presentation we make the case that the speckle inversion “phase problem” will be amenable to Machine Learning approaches. Our first demonstration has been published for 2D [3] and 3D data [4].

[1] Some implications of a theorem due to Shannon, D. Sayre, Acta Cryst. 5, 843 (1952).

[2] Uniqueness of solutions to two-dimensional Fourier phase problems for localized and positive images. R. H. T. Bates, Comput. Vis. Graph. Image Process. 25, 205-217 (1984).

[3] Complex Imaging of Phase Domains by Deep Neural Network Longlong Wu, Pavol Juhas, Shinjae Yoo and Ian Robinson, IUCrJ 8 12-21 (2021)

[4] 3D Coherent X-ray Imaging via Deep Convolutional Neural Networks, Longlong Wu, Shinjae Yoo, Ana F. Suzana, Tadesse A. Assefa, Jiecheng Diao, Ross J. Harder, Wonsuk Cha and Ian K. Robinson, npj Computational Materials 7 175 (2021)

Patrick Adams

RMIT, Australia

Iterative algorithm for recovering crystal structure factors from scattering correlation data

Crystallography is the quintessential method for determining the atomic structure of molecules within crystals, from structure factors. However, it is limited by various sample requirements such as having single crystal samples, of sufficient size. Fluctuation X-ray Scattering (FXS) is a diffraction analysis method that measures the correlations of scattered intensities from ensembles of randomly orientated but identical particles. We have developed an iterative algorithm that recovers crystal structure factors from FXS correlations, such that we can perform traditional structure determination techniques on structure factors, from the scattering of more then one crystal at once. In this presentation, we outline the development, and demonstrate the capabilities of this algorithm by refining the structure of three small chemical crystals from simulated FXS data. This method could facilitate the recovery of structure factors from crystals in a serial collection scheme, congruent with modern Serial Femtosecond Crystallography techniques at X-ray Free Electron Lasers, and relax sample requirements for crystallography experiments at synchrotron sources. In the future, this algorithm could also recover structure factors from ensembles of macro molecule crystals, such as membrane proteins, which do not readily crystallise into large crystals.

Andrew Morgan

University of Melbourne, Australia

A continuity constraint for multistate phase retrieval

Sabine Botha and Gihan Ketawala

Arizona State University, USA

A novel GUI for serial data classification using Machine Learning approaches

The structure solution of biological structure is being revolutionized by serial X-ray diffraction employing X-ray free-electron laser (XFEL) sources, particularly the advent of serial femtosecond crystallography (SFX). The measurements of micrometer-sized crystals at room temperature allow time-resolved studies to trace the path of biochemical reactions at unprecedented temporal resolution, made possible by X-ray pulses that outrun the effects of radiation damage. These pulses can deliver X-ray doses that are more than a thousand times higher than those that are possible with conventional X-ray sources.1 The large amounts of data produced by these studies (up to multiple TBs for a single experiment) must be quickly processed and analysed. While software for online data monitoring and data reduction have been designed over the past decade (e.g., OM2, Cheetah3, CrystFEL4), these are targeted towards finding crystal hits and not classifying data by spurious, often unquantifiable data artifacts. The state-of-the-art X-ray detectors are undergoing continuous development, and experimental parameters can push them beyond their reliable operating regime for individual frames within a single run of data collection. Including intensities from these frames into the merged structure factors can lead to inaccuracies in the final reported intensities, particularly detrimental for anomalous phasing or time-resolved difference density calculations where highly accurate recordings are required.

Here, we report a new data sorting tool that offers a variety of Machine Learning algorithms to sort data trained on either manual sorting by the user, or by profile fitting the expected intensity distribution on the detector based on the experiment.

This is integrated into an easy-to-use graphical user interface (GUI), specifically designed to support the detectors, file formats, and software available at the Linac Coherent Light Source (US) and the European XFEL (Germany).

[1] Chapman, Henry N, et al. Nature 470, no. 7332 (2011): 73-77.

[2] Barty, Anton, et al. Journal of applied crystallography 47, no. 3 (2014): 1118-1131.

[3] Mariani, Valerio, et al. Journal of applied crystallography 49, no. 3 (2016): 1073-108

[4] White, Thomas A., et al. Journal of applied crystallography 45, no. 2 (2012): 335-341.

Tek Mala

University of Wisconsin-Milwaukee

Analysis of mix-and-inject serial crystallography time-resolved data with singular value decomposition

Singular value decomposition (SVD) analysis is an important tool in the analysis of experimental time-resolved crystallographic data [1]. SVD can be used to reduce noise in the data, to determine the time-independent structures of the intermediates, and to extract a compatible chemical, kinetic mechanism from the X-ray data. The earlier programs for performing SVD, SVD4TX [1] and a newer version [2] could not be applied to X-ray data when large unit cell changes occur during the reaction since isomorphous difference maps cannot be calculated. In addition, these implementations relied on a fixed region of interest which is not given when the unit cell changes.

The time-resolved experimental data is obtained from mix-and-inject serial crystallography where the reaction of Mycobacterium tuberculosis beta lactamase with sulbactam inhibitor was investigated. Seven different time points ranging from 3 ms to 700 ms were collected. The binding of a bulky ligand to the enzyme induced large structural changes resulting in the increase of unit cell parameters by upto 3 Å.

Using a combination of custom bash scripts and python programs, I describe a method that can accommodate the changing unit cell and perform SVD. In addition, diffusion times and rate coefficients can be estimated by applying reasonable, chemically meaningful constraints and tying the analysis to observed occupancy variations of the enzyme-inhibitor. The time dependent concentration of reacting species inside enzyme crystals along the reaction coordinate can be characterized by factoring the results of SVD analysis, occupancy values of ligands and the diffusion.

[1] Schmidt M, Rajagopal S, Ren Z, Moffat K. Application of singular value decomposition to the analysis of time-resolved macromolecular x-ray data. Biophys J. 2003;84(3):2112-2129.
[2] Zhao, Y., & Schmidt, M. (2009). New software for the singular value decomposition of time-resolved crystallographic data. Journal of Applied Crystallography, 42, 734-740.

The manuscript describing this work in detail is currently accepted for publication in Nature Communications. A preprint is available at bioRxiv [doi.org/10.1101/2022.12.06.519319]. The programs are available in zenodo [doi.org/10.5281/zenodo.8206588].