Research

 

Sections

A brief summary

Current research

Collaborations

The NOE | Chemical shifts in proteins |  Study of proteins using high pressure NMR

Carbohydrate binding modules (CBMs) | Salivary proline-rich proteins and plant polyphenols |  Silk

My research interests are mainly in NMR and proteins, and thus generally in structural biology.

A brief summary

During my PhD I used NMR to look at the structure and interactions of antibiotics mainly related to vancomycin, still a vital drug in the constant battle against bacterial drug resistance. This led to an interest in the NOE, which I used to determine the definitive structure of vancomycin.

Around this time, Wüthrich was developing 2D NMR as a way of studying proteins, so after my PhD I got a research fellowship to work in his lab, where I was lucky enough to work on the first NMR structure of a globular protein (see his 2003 Nobel prize lecture).

Since then, I have worked both on NMR methodology and on determination of protein structures by NMR. In methodology, I have worked in three main areas, which you can find more details on below:


I have focused on some specific systems, which are described more below:

... and many other topics, described below.

 Current research

We are developing a computational method for measuring the accuracy of protein structures in solution (with my postdoc Nick Fowler, and our collaborator Adnan Sljoka in Toyko), named ANSURR.  This is the first reliable method for measuring the accuracy of protein structures in solution (eg NMR structures), and was published at the end of 2020. It avoids making comparisons to NOE restraints because it uses a modified version of the Random Coil Index, to calculate the rigidity per residue based on backbone chemical shifts; and compares this to rigidity as determined from the structure. It turns out that on average NMR structures are less accurate and floppier than crystal structures, although the best NMR structures are better than the crystal structures, mainly because crystal structures are (unsurprisingly) too rigid as depictions of solution structures. This work opens up enormous possibilities, which we are now busy exploring. We have for example looked at the accuracy of all protein NMR ensembles in the Protein Data Base, which is now also published. We show there that NMR structure quality improved steadily until about 2005, after which it has hardly improved; that precision and accuracy are correlated in PDB structures; and that the number of NOE restraints per residue is a much better guide to quality than NOE restraint violations (though of course ANSURR is better). The program itself is already available on GitHub, and there is a web site, which you can use to download the program, or look at the quality of your favourite NMR structure from the PDB. We are happy to discuss and advise! In 2022, we looked at AlphaFold structures, and showed that on average they are better than NMR structures. But there is a small number of proteins for which AlphaFold is wrong - mainly because it misinterprets dynamic regions as being structured. Our most recent use of ANSURR is to follow the improvement  in structure quality resulting from adding hydrogen bond restraints in a systematic way.

We are investigating the molecular basis for the Hofmeister effect, following on from earlier work. The Hofmeister series is an empirical list of anions that range from charge-dense ions like sulfate, which stabilises proteins but makes them less soluble (salting out), to more diffuse ions like thiocyanate, which destabilises proteins but makes them more soluble (salting in). There has been much debate about how this works. It is clearly an important question, because it would hopefully lead to a more rational choice of cosolutes (excipients) for stabilising proteins. We have studied Hofmeister ions as well as osmolytes, and our conclusions are that the unifying mechanism is the effect of solutes on solvent fluctuations, though in addition, binding to the protein affects both charge and protein dynamics .

We have ongoing collaborations (see below) on a range of exciting topics.

We are also studying the signalling adaptor protein SH2B1.

We recently published a paper showing that you should fit affinities from HSQC titrations using all the data simultaneously; and interestingly, that the variation in binding affinities that you see from different residues is real - there is not a single affinity but many.  And another one exploring how proteins bind to their ligands, emphasising intrinsically disordered regions and cooperative binding.

I recently published a paper on a completely new topic for me, the origin of life on earth. This proposes that Darwinian natural selection cannot operate before life starts, and so the buildup of concentrations of biomolecules was due to autocatalytic selection, in which molecules catalyse reactions that lead to increases in their own concentrations. This led to new proposals for how amino acids, sugars, lipids and nucleic acids built up. 

 Collaborations

Tetsuo Asakura (Tokyo University of Agriculture and Technology) who introduced me to the wonders of silk, and first got me interested in chemical shifts.

Stéphane Mesnage (University of Sheffield) who works on the bacterial cell wall (particularly in Enterococcus) and the proteins that bind to it. We have worked on LysM and a fascinating two-site recognition in lysostaphin SH3b, and are currently working on the capsule (extracellular polysaccharide) from E. faecalis, and on WxL domains. We also characterised a novel crossbridge in C. difficile.

Kathryn Ayscough (University of Sheffield) who has a common interest in proline-rich proteins, via actin. We have a joint student (Lewis Hancock) developing analytical tools for understanding the complex interactions that occur in actin polymerisation.

Sarah Staniland (University of Sheffield), with whom we are working (with a joint student, Fraser Beaumont) on how proteins catalyse formation of magnetite in magnetotactic bacteria.

Poul Erik Hansen (Roskilde University, Denmark) who we are collaborating with on detection of salt bridges in proteins.

Robert Poole (University of Sheffield). We studied the ruthenium-based (and misleadingly named!) carbon monoxide-releasing molecules (CORMs), and how they really work.  The toxic agent is not the carbon monoxide but the ruthenium, which not only binds to thiols but also to DNA and leads to double strand breaks.

Previous collaborations have been with Jim Thomas (University of Sheffield) on ruthenium-based DNA ligands; Albert Ong (University of Sheffield) on proteins involved in polycystic kidney disease; Neil Hunter (University of Sheffield) on light harvesting proteins; and Kazu Akasaka and Ryo Kitahara (Ritsumeikan University) on high pressure NMR.

NMR Methodology

The NOE

The NOE is a very widely used measurement in structure calculation by NMR, because the intensity of the NOE is related to internuclear distance.

I used 1D and 2D NOE methods to determine the structure of vancomycin and investigate its mode of action (by binding to bacterial cell walls and preventing their cross-linking), in the process publishing what must be one of the worst 2D NOE spectra ever published.

While working in Kurt Wüthrich's lab, I then used the NOE to determine the first NMR structure of a globular protein, BUSI IIA in 1985. I have subsequently determined many other structures.

I developed NOE theory and NOE methods, including a program for simulating NOEs in large systems, and treatment of flexible molecules. I have used a range of NOE methods, including using the time-averaged NOE (2002) and ensemble averaged NOE (2010).

I am a co-author on what remains the only current book on the NOE: do go and buy a copy if you don't have your own personal and well-thumbed copy - The NOE in structural and conformational analysis, D Neuhaus and M P Williamson, published in 2000 and still in print.

Chemical shifts in proteins

Protein chemical shifts can be calculated in two fundamentally different ways. The first (which I have not used) is the 'proper' way, ie using a quantum mechanics method such as Density Functional Theory (DFT). This is a time-consuming calculation and quite unfeasible for whole proteins, but very possible for capped amino acids for example. A number of groups have used DFT to calculate chemical shifts ab initio, including those of David Case. It is the only useful method for calculating shifts for C' (carbonyl) and 15N, and gets carbon shifts accurate to within about 1 ppm and proton shifts within about 0.3 ppm.

The second method, which is the one I have used, uses empirical fitting of experimentally derived shifts against known protein structures, to obtain parameters for formulae derived from theoretical methods. This method gets Calpha and Cbeta shifts accurate to about 1 ppm RMSD, Halpha to about 0.25 ppm, and HN to about 0.5 ppm. It therefore performs similarly to DFT, but is a lot quicker. The main reason why I prefer this method however is that it is easy to see where different effects come from, and therefore to understand the molecular factors governing shifts. There are now several websites that use a combination of DFT and empirical calculations to calculate shifts, eg those of David Case and David Wishart. By comparison eg to Wishart's SHIFTX2 method, it is less accurate (not by a large factor) but is much simpler, and much easier to see where the effects come from.

We have written programs to calculate both 1H and 13C shifts in proteins. The 13C work is described in more detail in papers from 1997 and 1999. We have used 13C shifts to calculate the structure of silk [Biopolymers 1997 41, 193-203; Int J Biol Macromol. 1999 24, 167-171], and we have shown that 1H shifts can be used as a measure of structure quality (1995) as well as to characterise ligands intercalated into DNA, for example to distinguish between two possible intercalated structures. The CRAMPS method allows high-resolution proton spectra to be obtained in the solid state, making it useful to use 1H NMR shift calculations to interpret solid-state NMR spectra. 1H chemical shifts can also be used to interpret ligand docking in a more quantitative manner than is normally done.

This work led to a review on the use of chemical shift perturbation to characterise protein binding, which has gone on to be a Highly Cited Paper (over 1200 citations to date), and is now the fifth most highly cited paper from Progr NMR Spectroscopy.

We then used changes in chemical shift as a result of temperature or pressure to calculate changes in structure (melittin, BPTI, lysozyme, protein G, DNA double helix, and barnase). We have shown that we can calculate the structure of proteins at high pressure using chemical shifts as restraints, and summarised the work in a review. The results show how proteins fluctuate at ambient pressure, and how they start to denature with pressure. In particular, our work on barnase shows that volume fluctuations in the free enzyme occur on the microsecond timescale and are quenched on addition of inhibitor, thereby providing insight into how rapid thermal fluctuations get channelled into the much less frequent but functionally important 'induced fit' type motions. We subsequently used this result to describe how two different motions in a protein can be described as hierarchical, and what this actually means.

Importantly, we have also shown that 13C shift changes can be used to follow structure changes with pressure. This work also showed that bond compression causes measureable shift changes, and that the best explanation for the classic gamma-gauche effect is probably bond angle compression and consequent re-hybridization of orbitals.

Chemical shifts can also be used (together with other parameters) to analyse the conformational changes in proteins resulting from pH change. We have shown that chemical shifts can be used to characterise concerted structural changes across a beta sheet.

We have also shown that HN temperature coefficients are good measures of hydrogen bonding (in many cases, better than exchange rates), though they mainly reflect loss of structure with increasing temperature, rather than hydrogen bonding itself. They can also be used to look at the lowest energy excited state structures of proteins, such as cytochrome c, and proteins G and L.

More recently, we have shown that the changes in chemical shift with pressure can be used to describe conformational change resulting from an increase in pressure. The addtional energy input from pressure is very small, so the perturbation to the structure is small. By fitting the shifts we obtained a complete chemical shift assignments for the high pressure form of the R3 domain of talin, showing the conformational change of talin that normally occurs on mechanical pulling and allows it to bind to vinculin. We went on to use similar methods to look at a conformational change in the potassium channel blocker ShK, which hopefully will allow design of drugs targetted at the active state. In a 2018 review we show that the high-pressure structures of proteins are (so far) always functionally important.

Study of proteins using high pressure NMR

My interest in high pressure was initially in showing that the structural change of proteins from low to high pressure can be calculated using the changes in 1H chemical shift as structural restraints. This method has been applied to BPTI, lysozyme, protein G and barnase, as well as B-DNA, and we have reviewed this work. We also investigated whether 13C shifts could be used in the same way. The answer is yes, though not as successfully, because the changes in 13C shift due to bond compression are of similar magnitude to those due to structural change.

The work with barnase showed that pressure dependence provides important information on protein fluctuations at ambient pressure. Pressure stabilises partially unfolded states: using ubiquitin we provided the best evidence yet that these partially unfolded states are functionally important, because the high-pressure alternative state of ubiquitin ('N2') closely resembles the E2-bound form (see also two further papers).

The work described above showed that the conformation stabilised by high pressure was functionally important, being a binding-ready conformation for BPTI and protein G, as well as for lysozyme, and the closed form of barnase. This suggested that high pressure can be used to stabilise and characterise otherwise inaccessible conformations of proteins. This is clearly true for ubiquitin. We therefore developed a method for fitting chemical shift changes with pressure to obtain the chemical shifts of the "high-pressure" form, which allows us to describe the structure of the excited state. We applied it to look at the R3 domain of talin, and showed that it is thermodynamically poised to act as a mechanosensitive switch, detecting pulling on the cell [Structure, 2017, 25, 1856-1866]. We have reviewed this topic, showing that so far every protein studied under pressure is moved to a functional alternative state [Biochim. Biophys. Acta Proteins Proteomics, 2019, 1867, 350-358].

What does a single spin look like?

I have always felt that it is important to be able to explain things in words and pictures, not just equations. NMR experiments can be explained remarkably well using vectors (with the occasional help of product operators); but one aspect that has always bugged me is the very initial explanations of single spins, which are generally explained either (a) as something that is either up (alpha) or down (beta), or else (b) as spins rotating on two cones. Both of these are not helpful when you start to think about pulses. A much more useful representation is to draw spins as more or less randomly distributed in space (c). In a magnetic field there is a slight tendency to point upwards, but it is very slight. This representation is equally compatible with the quantum mechanics, but is much more logically consistent. In particular, it makes much more sense of relaxation, because it shows that relaxation of one spin due to the effect of another (ie dipolar relaxation) is effectively a very short on-resonant pulse - it does not cause a 180 degree flip, merely a small change in orientation. Many of these then add up , in the same way that a large number of random walks lead to diffusion. These ideas are explained more fully here.

Biological targets

Carbohydrate binding modules (CBMs)

Most enzymes that digest polysaccharides have a catalytic domain and also a separate carbohydrate binding module (CBM). Since 1994 we have been studying the structure and function of these modules.

Starch binding domain

Although the main function of CBMs is to attach the enzymes to their substrate, they can sometimes have other functions. Our first CBM was a starch-binding domain from the glucoamylase of Aspergillus niger. This was published in 1996, 1997a and 1997b, and showed that there are two binding sites for starch helices, which twist the helices apart [FEBS Letts 1999 447, 58-60] and thereby increase the rate of hydrolysis. This latter result was published here - a paper that I am particularly proud of because it is pretty much my only contribution to enzymology.

Family II CBMs

Family II CBMs can be divided into Family IIa, which bind crystalline cellulose (and have no other apparent function), and Family IIb, which have a similar sequence, but bind to xylan. This is because a crucial Trp, which stacks against a sugar ring, is twisted roughly 90°, a result neatly demonstrated by mutating a residue adjacent to the Trp from Arg to Gly, which rotates the ring back 90° and converts it to a cellulose binding domain.

We have studied the thermodynamics of binding between CBMs and polysaccharides. There is a clear difference between this interaction and other more familiar interactions, eg between lectins and their ligands, in that the polysaccharide chain is held much loosely on the surface of the protein. This allows it more motional freedom on the surface, and means that deletion of hydrogen bonding sidechains on the protein has almost no effect on the free energy of binding, but a large effect on enthalpy and entropy. The more buried the ligand is, the more effect hydrogen bonding has on the free energy (although water-mediated hydrogen bonding seems to be unimportant for the free energy).

CBMs from other families

We have also solved the structure of a family 10 CBM, which binds crystalline cellulose via its three coplanar rings.

We have also studied a family 4 CBM with a novel ligand specificity. It binds both to amorphous cellulose and to xylan, though with preferred binding to xylan. Specificity comes largely from the orientation of the aromatic rings in the typical family 4 cleft, and from the bottom of the cleft, which is quite polar (and therefore matches xylan better than cellulose). The CBM has two calciums bound, which contribute to its thermostability. The structure has helped to engineer mutants with different xylan-binding properties.

We have also determined the structure of a new family, family 35. This is an interesting family because it recognises mannan (and amorphous cellulose to some extent). Our structure has shown that the recognition is a combination of structure and flexibility. We have also looked at a family 29 CMB, and studied ligand binding to a family 22 CBM.

Dockerin domain

Anaerobic organisms digest cellulose using a quite different system to the CBM paradigm. They contain a 'scaffoldin' protein, which has a cellulose-binding module plus a series of homologous 'cohesin' domains, whose function is to bind to 'dockerin' domains. The dockerin domains are part of the polypeptide chains of the catalytic modules. Therefore, the scaffoldin protein assembles an array of catalytic domains into one place, using the cohesin/dockerin interactions.

So far, it appears that bacteria have several such types of protein pairs, but fungi have only one. We have determined the first structure of a fungal dockerin domain. More recently, we have determined the structure of a tandem double domain, and shown that it has a flexible linker and most likely binds to carbohydrate rather than to protein.

The LysM and SH3b domains

LysM domains are used by bacteria to attach catalytic domain to their own (and other) cell walls to remodel them. Their mode of action is not well understood, and they bind not only to bacterial peptidoglycan but also to fungal and arthropod chitin and to the plant nodulation factors NodF. We determined the structure of free and bound LysM from AtlA, showed that it binds GlcNAc-X-GlcNAc largely using pockets that accommodate the N-acetyl groups, and also that the six LysM domains act independently and additively. We then looked at the SH3b domain from lysostaphin, which binds to both the pentaglycine bridge and the peptide stem from S. aureus - but cannot bind properly to both at the same time, with fascinating consequences, as described here.

The WxL domain

We are currently working (with my recently graduated PhD student Mahreen Ul Hassan) on a poorly characterised domain that functions to bind to peptidoglycan, namely WxL.  It interacts with a second protein previously characterised as DUF916 whhichhc we have renamed as WxLIP.

Salivary proline-rich proteins and plant polyphenols

Salivary proline-rich proteins (PRPs) are the major proteins in parotid saliva. A major function appears to be to bind to plant polyphenols (tannins), which we consume in tea, coffee, wine and many fruits and cereals. The binding is primarily a hydrophobic stacking. Multiple weak interactions produce precipitation [J. Chem. Soc. Perkin Trans. 2 2000, 317-322], also known as tea cream, which is probably responsible for the astringent taste ('body') of tea and red wine [Recent Adv. Phytochem. 1999 33, 289-318]. We have shown how the precipitation occurs, which involves binding of polyphenol to protein, followed by dimerisation of a protein cross-linked by polyphenol, after which the aggregate precipitates and subsequently increases in size by binding to more polyphenol/protein complexes. The binding is not a fixed geometry, but averages between many conformations. We have identified the main interactions (mostly ring stacking) using time-averaged NOEs.

We have used a range of biophysical techniques (eg light scattering, X-ray scattering, analytical ultracentrifugation, viscometry, electron microscopy and NMR) to look at astringency, and have shown that astringency is basically a loss of wettability of the mucous layer on the palate, caused by multivalent binding to polyphenols. This work has produced a model of how astringency develops. Subsequently we studied astringency using single molecule force microscopy, which has shown that single protein molecules wrap around multivalent polyphenols, and therefore compress the mucous layer. This work also shows that the binding force of polyphenol to protein is almost entirely entropic, ie hydrophobic not hydrogen bonding.

Our work on proline-rich proteins has led to two major reviews of these biologically very important classes of proteins, in 1994, and more recently in 2000 (among my most cited papers, the second one with over 1500 citations). And we have recently used STD techniques to demonstrate that EGCG, the main polyphenol in green tea, binds to the T-cell receptor CD4 at physiological concentrations, and therefore could prevent the binding of the HIV coat protein gp120. We also used NMR titrations and ITC to show that EGCG binds to human albumin: tightly at two locations and weakly at about 9.

Silk

The structure of silk is surprisingly poorly understood. In a longstanding collaboration with Prof Tetsuo Asakura at the Tokyo University of Agriculture and Technology, we have worked on protein chemical shifts and also on methods for characterising silk structure, by solution state and solid state NMR. In the silk worm gland, silk is present in a form called Silk I; the better known fibrous form is called Silk II. We showed that the crystalline form of silk consists of two different packing arrangements in close proximity. The silk work led to a publication in Angewandte Chemie in which we show that antiparallel polyalanine crystallises in two different forms depending on its length. This is interesting first because it suggests that spider dragline silk may derive some of its remarkable strength from its polycrystalline form, and second because the completely linear strands have a rather different position in the Ramachandran plot compared to most beta strands: they are distinctly to the left of standard beta-sheet conformation. This suggests that (a) it takes very little energy to distort a beta-strand conformation, and (b) the phi/psi combination (particularly phi) looks to be strongly related to the twistedness of the strand.

In 2014/15 we published three significant papers. Two of these in 2014 and 2015 show definitively that although silk II is indeed an antiparallel beta sheet, it has two different packings, both different from the classic model proposed by Pauling in 1955. The third is a review of what is now understood about the structures of silk I and silk II, particularly as understood from NMR. More recently we have shown that the crystalline form of silk consists of lamellar layers held together by hydrogen bonds to serine hydroxyls, rather like Velcro (see also our 2023 review).