Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning

Doug Townsend & W. Chase Sanders

Introduction

Membrane fusion is an essential process for viral entry into cells. Viral fusion with the host cell membrane is mediated by viral surface glycoproteins. These proteins bring the viral membrane into close proximity with the cell membrane and allow for virus-cell fusion. Viral fusion proteins often exist as metastable structures on the viral envelope surface. When triggered, the viral fusion protein undergoes a large conformational transition to adopt its post-fusion state whereby it embeds in the host-membrane to facilitate membrane fusion.


Figure 1. (Top) Structural differentiation between the prefusion and postfusion conformation of viral fusion proteins.

As the causative agents for viral entry and subsequent pathological infection, viral fusion proteins make for attractive vaccine candidates. Furthermore, because viral fusion proteins decorate the surface of viral envelopes, they are major targets of the humoral immune response. Viral fusion proteins in their prefusion states have been shown to be more effective at eliciting broadly neutralizing antibodies and result in greater recombinant expression. However, prefusion conformations are often unstable and thus require significant engineering. Common strategies to improve protein stability are the addition of cavity filling mutations, salt bridges, hydrogen bonds, proline residues and the addition of disulfide bonds. The latter is unique in that the cysteine thiol group (-SH) is capable of forming a reversible covalent S-S bond. The cross-linking between two cysteines can serve to stabilize monomeric or multi-subunit proteins making it an attractive candidate for protein engineering. When applied to structural-based vaccine design, disulfide bonds can be used to “lock” two flexible regions together, thus preventing a conformational change from prefusion to post-fusion.


Objective

In the proposed study, we aim to take advantage of the large repository of structural data captured in the Protein Data Bank (PDB) to generate a machine learning model (using Python’s SciKitLearn package) capable of predicting the likelihood that a pair of residues will form a disulfide bond when mutated to cysteine. We will mine the PDB to derive structural features relating to protein geometry and the chemical environment surrounding naturally occurring disulfide bonds to train a machine learning algorithm. Herein, we plan to identity how the geometry of residues surrounding the target residue affect whether a successful disulfide bond will form (i.e. some sequence and geometrical motifs are more likely to contain a disulfide bond than others).