A Monte Carlo Simulation of Protein Oligomerization in Budding Yeast

 Peggy Wang and Jeremy O'Connell
Dept of Biomedical Engineering, Dept of Biochemistry. Marcotte Lab, University of Texas at Austin, 2800 Speedway Austin TX 78706


Aggregation and Disease
    Proper folding of a protein is essential for catalytic activity, ligand binding, and interacting with other proteins. Misfolded proteins often lack not only the basic functional capacities of their native structures, but can form aggregates that actively contribute to a wide range of diseases. For instance, several well studied neural degenerative diseases are in part caused by protein aggregation. In Parkinson's disease, dopamine producing cells die which diminishes movement control. The leading hypothesis for the cell death is the aggregation of either alpha-synuclein or parkin proteins. Similarly, Alzheimer's and Huntingtin's are motor and cognitive degenerative diseases caused by the aggregation of beta-amyloid and huntingtin proteins, respectively. Less well known are the cases of heart disease, type 2 diabetes, Sickle Cell Anemia, and kidney disease caused by protein aggregation. The risk of developing any of the above diseases increases with age and collectively pose an ever growing threat our aging populace.
    Prion diseases are a recently discovered extreme example of protein aggregation causing fatal neurodegeneration. Unlike the neurodegenerative diseases which require misfolding of one's own proteins, prions are proteins that are transmissible from other people or even other species that transform native proteins into misfolded proteins. Once in a prion state the proteins aggregate causing neurodegenerative diseases called transmissible spongiform encephalopathy. Moreover, prions have been shown to be infectious through body fluids and food. Some prions are stable enough to resist autoclaving, UV and acid and to remain dangerous outside of a host for long periods (8). Some common examples of prion based diseases are Creutzfeldt-Jacob Disease (CVD when the prion source is human and vCVD when contracted from other species), Mad Cow Disease, Chronic Wasting Disease in deer, and Scrapie in sheep. Less well known are the prion diseases in cats, ferrets, and even brewing yeast. Much research has been directed at discovering the mechanism of prion conversion and protein aggregation in hopes of finding a treatment for a lethal and incurable class of diseases.

Aggregation Formation
    There are two main factors that affect the formation of protein aggregates. First is a protein's capacity to change its exposed residues, either through misfolding or covalent modification, and second is the stability of its inter-protein interactions. Proteins can misfold under a variety of circumstances. The first chance a protein has to misfold is during translation as the extending polypeptide chain is extruded from a ribosome and begins to fold. Short proteins are usually capable of folding into their native state on their own, where as longer proteins and structural proteins often require a class of helper proteins called chaperones to fold properly. Translational errors, a lack of chaperones, and chance can all affect protein misfolding during translation. Once translated and properly folded, a protein must unfold before misfolding is possible. Unfolding can be caused by thermal fluctuations at sufficiently high temperatures or by denaturing chemicals.  The conditions in which unfolded proteins attempt to refold are much different than during initial folding which contributes to a higher misfolding rate.
    Once misfolded, proteins may expose previously buried amino acids that allow inter-protein binding. If the concentration of misfolded proteins is high enough for productive collisions to occur on a biological time scale, then misfolded proteins have a chance of sticking together. If these protein-protein interactions are stable enough, over time more misfolded proteins will join thus forming aggregates. The interaction can be through hydrophobic patches as in Sickle Cell Anemia or more commonly through inter-protein beta-sheets as in the neurodegenerative diseases. Interestingly, formation of the inter-protein beta-sheet are largely dependent on the polypeptide back bone. Since all proteins by definition contain this structure, some hypothesize that all proteins posses to varying degrees the potential to aggregate.
Plague of the Proteome
   Evidence of protein aggregation as a wide-spread phenomena across the proteome comes from over-expression studies, computational assessment of proteomes, modeling of peptide interactions, and in vivo screens from our lab. A well known problem in biotech industry is the formation of insoluble aggregates when expressing proteins at unnaturally high concentrations, especially in a non-native host cells which lacks otherwise naturally present chaperones (5).  A computational study looking at the aggregation propensity of several species' proteomes found that while nearly all proteins contain some potential to aggregate the tendancy is much less for essential proteins, structural proteins, and other natural oligomers. A comparative approach showed that the more complex and longer lived a species is the less aggregation prone its proteome is (7). In a following work Monsellier et al observed that the regions surrounding aggregation prone sequences across the human proteome are enriched for highly soluble amino acids, and propose this adaptation is a mechanism for the evolution of general aggregation resistance. Recently a Monte Carlo Simulation demonstrated a sequence independent mechanism for peptides to form oligomers. Their mechanism proposes a meta-stable hydrophobic nucleation followed by inter-chain beta sheet formation and stabilization (6). The sequence independent nature of aggregation predicted by these computational works is supported by in vivo screens done in our laboratory. In short, a GFP-library of S. cervisaea (a collection of strains which each contain a different protein linked to GFP) was screened for protein localization changes over time. Of the roughly 800 cytosolic proteins monitored a large fraction showed strong singular punctates of GFP fluorescence as the cells aged. One hypothesis to explain the congregation of the normally diffuse cytosollic proteins is they formed aggregates.

Our Model
    We present a stochastic model which simulates punctate formation in yeast using a Monte Carlo algorithm.  A Monte Carlo simulation makes use of random numbers and probability statistics to examine a complex problem or system. Computational methods such as Monte Carlo have recently been successfully utilized to describe the basic mechanisms of various biological behavior.  Cancer cell proliferation and migration (9) and immunological receptor-ligand clustering (10) are just two examples of such studies.
    Here we attempt to describe the manner in which protein punctates are formed and connect our predictions to experimental results.  We model how numerous copies of a particular protein diffuse about in a single yeast cell and how the proteins bind and unbind to each other.  We observe the behavior of the proteins across a series of discrete time steps, particularly tracking the relative timing of cluster formation and cluster sizes.  We incorporate various parameters predicted to affect punctate formation: protein size, weight, copy number, binding and unbinding constants.  By varying these factors in our simulations, we illustrate how each affects punctate formation and their relative importance.  Our model provides a framework for predicting how punctates of a particular protein are formed using basic known properties of the protein. We also attempt to use our simulation results to compare with, if not recapitulate, previously gathered observations of punctates in yeast.  Furthermore, our simulations can provide insightful information for the design of future empirical experiments studying punctate behavior.


    To find the parameters that most affect aggregation in our model we ran a battery of simple experiments in which each parameter (e.g. protein size, binding or unbinding, or protein concentration) was individually varied over several orders of magnitude. The starting parameters are based on a an idealized GFP-tagged protein in a yeast cell sized space. Namely a protein with a 5nm radius, ~350 000 copies per cell, and kd of 0.0013 per time step in a two dimensional version of 2.5 micrometer wide yeast cell. Clusters smaller than 10 are considered diffuse and clusters over 25 are thresholded as visible punctates. An example of the simulation over a short period is shown in Figure 1.

Diffusion Rate   
    First we varied the size of the idealized protein's hydrodynamic across biologically relevant sizes plus an order of magnitude larger (Figure 2). Our model uses a slightly modified Einstein-Stokes Diffusion Equation in which the hydrodynamic radius of a diffusing particle is inversely proportional to its diffusion rate. So by increasing the protein's size we slow its diffusion rate. At the end of the simulation, all but largest proteins formed visible punctates (Figure 2 slide 1,4). The kinetics of cluster formation show that the large proteins quickly made initial connections, but the collective mass of the aggregate prevented it from finding other aggregates leading to a high number of small clusters (Figure 2 slide 2,3). Interestingly only the smallest and most freely diffusible proteins were able to form very large clusters (Figure 2 slide1) . 

Protein Size
    Next we varied the size of our proteins in the approximated 2D cell. Because a grid unit is defined relative to protein size (i.e. one grid unit is the protein's size) the cell volume can only stay constant by inversely varying the size of the grid. Since the protein concentration stays constant as well the larger protein and smaller grid size simulations have a smaller number of proteins. For the purposes of testing our model the protein's size and hydrodynamic radius are varied independently, but in nature they would be the same. Strikingly, the model shows that protein and grid size with the biological range do not affect the relative amount of clusters formed (Figure 3 slide 2). However the number of proteins in the simulation strongly effects the total number of clusters and the size of the clusters even at the same concentration (Figure 3 slide1,3). Not surprisingly, the number of visible punctates also depends on the total number of proteins as well (Figure 3 slide 4). Smaller proteins showed more total clusters and a wider distribution of cluster sizes.

Binding Rate Constant
    We test the affect of the protein binding constant, kb, by varying it across three orders of magnitude(Figure 4).  We find that when kb is greater than a threshold of about 0.5, the average cluster size steadily increases with time.  A higher kb leads results in a steeper incline, increasing to about 20 proteins after 5000 timesteps.  Fr kb<0.5, the average cluster size stays steady around 2 proteins.  The number of clusters formed for kb>0.5 first spikes to about 500, then gradually decreases with time.  A higher kb results in a steeper decline, decreasing to about 125 proteins after 5000 timesteps.  For kb < 0.5, the number of clusters increases quickly for the first 200 iterations, then stays steady for the rest of the simulation.  In this scenario, a greater kb forms a greater number of clusters (around 275 and 40 clusters for kb=0.1 and kb=0.01, respectively).  We only see visible clusters for kb>0.5, with a greater kb resulting in more visible clusters.

Unbinding Rate Constant
    Similarly, we test the affect of the protein unbinding constant, ku, by varying it across three orders of magnitude(Figure 5).  We find that the average cluster size grows past 3 only when ku is on the order of 0.001.  For ku=0.001, the average cluster size grows to about 22 after 5000 timesteps.  All other ku's result in cluster size steady around 2.  The number of clusters formed for ku=0.001 first spikes to around 500, then gradually decreases to about 120  For all larger ku's tested, the number of clusters quickly approaches a steady state number.  In this scenario, a smaller ku results in more clusters being formed.  It follows that we only see visible clusters for ku=0.001.

Protein Concentration
    We vary the protein concentration from Cp=0.1 to 0.00001 (~35k proteins per cell to 3.5 proteins per cell) .  We find that for Cp>=0.01, the average cluster size increases as a function of time (Figure 6 slide 2).  The average cluster size remains steady at about 2 for Cp=0.001, and 0 for any smaller Cp.  We also see that for Cp>=0.01, the number of clusters peaks to around 500 in the first 250 timesteps, then steadily decreases for the remainder of the simulation.  For Cp=0.001, the number of clusters remains level around 25. For even smaller Cp, we hardly observe any clusters form.  Visible clusters are only observed for Cp>=0.01.

GLN1 vs GLT1
    Finally we tested our model using the parameters for two real, but very different GFP-tagged proteins that both show punctates in our in vivo experiments (Figure 7 slide 4). The first is glutamine synthetase, GLN1, which is slightly larger idealized protein but otherwise similar. We compared GLN1 to glutimate synthetase, GLT1,  which at 238 kDaltons is much larger but has less copies per cell (4). The GLT1 proteins quickly collapses to a single large cluster of ~700 proteins. In contrast the GLN1 proteins show a slow steady increase in average cluster size toward an equilibrium that is not reached in the time scale of this simulation.F

Long Simulation of GLN1
    To find if the equillibrium cluster profile of GLN1 was the same as GLT1 we ran the simulation 5 fold longer (Figure 8 slide 2). As predicted from the parameter variation experiments, the end profile of the smaller GLN1 shows a higher total number of clusters and broader size distribution than GLT1 though its single largest only reached ~500 proteins (Figure 8 slide 1).


    Our parameter tests show that our aggregation model is only slightly perturbed by changes in either  protein size and diffusion within biologically relevant ranges. However, when both parameters were varied together in the GLN1 vs GLT1 simulation, a drastic effect was observed. The simulation of the larger GLT1 proteins reached equilibrium ten fold faster than the simulation of GLN1 proteins. The magnitude of difference is likely less in real cells where the "infinitely dilute" assumption of the Einstein-Stokes equation is not quite accurate (Goodsell's Image). The results correlate with our in vivo findings that GLT1 proteins form long lasting, single punctates even in log phase growth. The duration of punctate existence is may represent the time needed for chaperones to dissaggregate the single large cluster of GLT1 proteins. 
    Binding, and especially unbinding rate also show large effects on the number and size of clusters. When either the binding rate is too low or the unbinding rate is too high many small clusters form instead of collapsing into a single highly stable cluster. This suggests that a protein whose oligomerization is highly dynamic should not form single punctates without an organizing force from the cell. An indirect line of evidence from our punctate experiments with the highly dynamic oligomers of ADE4 adds support. Our group and others have found that while ADE4 punctates are stable while the cell is alive, they quickly disperse upon cell lysis. Finding the existence and mechanism of an aggregation centralizing force would be a novel discovering in yeast.
        The existence of the force could be tested by characterizing the stability of various protein's punctates through photobleaching and recovery experiments followed by lysis for the cells that show dynamic punctates. If the hypothesis is correct the puntates should disperse upon lysis. If the force does exist the mechanism may be similar the one in higher eukaryotes which uses microtubule based dynein motors to gather cytosollic aggregates to the microtubule organizing center. Conservation of this mechanism in yeast could be tested by showing co-localization of punctates with the microtubule organizing center proteins or by inhibiting microtubule assembly and observing punctate formation.
    Another avenue of future work could aim to refine our model to be closer approximate the conditions of the cell and expand the fraction of cytosolic proteins tested.  The diffusion rate could be adjusted to be more realistic by either making the size based decline exponential or by adding space filling cellular components to scale as diffusion obstacles.  The cell lattice could be modified to be three dimensional sphere and to represent real spacial units (e.g. nm) as its base unit rather than scaling to protein size. This would free up proteins to scale to three dimensions as well take approximations of their real shapes. The model should also take into account the misfolding or covalent modification needed to nucleate aggregation.
Another possible addition to our model would be to apply more careful consideration to the detailed molecular interactions between two binding proteins.  Auer et al recently performed a Monte Carlo analysis of the hydrophobic and hydrogen bonding interactions between proteins (11).  While their amyloid formation model operates on the scale of a single cluster, it provides details of the kinetics of cluster assembly which could be applied to our whole cell model. Perhaps most importantly, determining the kb and ku rates through empirical studies would refine the kinetics of cluster formation and the appropriate time scale to reach equilibrium. With these improvements in place, the model could be tested using the parameters for all ~800 yeast cytosolic protein to correlate with the yeast GFP-library punctate findings. The comparison could provide evidence for or against thermodynamically favorable aggregation as a sufficient explanation for punctate formation yeast.


Methods and Materials

Monte Carlo Simulation Details

In our simulation, proteins are represented as objects on a two-dimensional square lattice. We make the following assumptions to simplify our model:  1.Only a single protein can occupy a site on the lattice.  2.In a single timestep, a protein or cluster may move one step on the lattice, a binding between two adjacents may form, or an existing binding may break. 

A. Calculating dimensions and concentrations

In order to represent a three-dimensional space on a two-dimensional lattice, we perform the following calculations.  Assuming our cell is spherical in shape, we use the average radius of a yeast cell(2500nm) to calculate its volume:

(4/3) * pi * (2500^3) = 6.54498469 × 10^10 nm^3

Similarly, we calculate the volume of protein.  We assume that the protein is also spherical in shape, with a radius of length 2.5 nm for each 300 amino acids.  For example, Gln1 which is 370 amino acids long is approximated to have a radius of 5 nm, and a volume of 523.598776 nm^3.

Next we need to simulate a projection of our cell from 3-d space into 2-d space.  Because we choose to represent our cell as a square "slice", we calculate the area of our lattice to be the square of the cubic root of the volume:

(6.54498469 × (10^10))^(2/3) = 16240737.9 nm^2

Similarly, we calculate the area of a single Gln1 protein and its copy number:

523.598776^(2/3) = 64.9629515 nm^2

346 000^(2/3) = 4928.52994 copies

Finally, the ratio of the cell area to a single protein area is calculated:

16240737.9 / 64.9629515 = 250000

We choose our lattice site to be the size of our protein and calculate our lattice size accordingly: 

500 x 500 = 250000 lattice sites

B. Calculating diffusion rates

According to the Stokes-Einstein relation, the diffusivity of a particle is indirectly proportional to the particle's radius:

D = (k T)/(6 pi n  r_c) (nm^2/s)

where k is Boltzmann's constant, T is the absolute temperature (303.15 K), n is the viscosity of the cytosol (4.008 cP), and r_c is our estimated radius of the protein or cluster.  For example, a single Gln1 protein will have a radius of 5nm, while a cluster of 5 Gln1 proteins will have a radius of 25 nm.  Recall that we scale our lattice sites to the size of a particular protein, so we need to multiply the diffusivity by a factor of 1/r_p, where r_p is the radius of a single protein, to reflect the distance that a protein can travel in a single time step:

D' = D (1/r_p)

Note that as a cluster size(and radius) grows, the associated diffusivity decreases and the cluster becomes less likely to move.

C. Calculating binding and unbinding rates

The binding and unbinding rates of a protein are randomly chosen based on the test parameters that generally reach equilibrium within the time frame of the experiment.

D. Simulation Details

We use a lattice Monte Carlo method and run our simulation over 5000 timesteps.  In a single timestep, a protein or cluster is allowed to move to an adjacent square and/or bind or unbind to an adjacent protein or cluster.  If an object has one or more empty adjacent spaces, one of the spaces will be chosen with equal probability and the move will be made with probability D'.  If the object is adjacent to another object it is not already bound to, it will bind with probability kb.  Similarly, if an object is already bound to another object, it will unbind with probability ku.  We repeat our simulation with varying input parameters.  We observe the behavior of our system with various protein sizes, concentrations, binding constants and unbinding constants.  We also simulate the behavior of two proteins of interest in punctate studies: Gln1 and Glt1.
Plotting Results
     Our Monte Carlo simulation was carried out using Perl (available here).  The resultant data from our simulations were plotted in Matlab.  All figures were generated using predefined plot, hist and scatter functions.  A download of our Matlab code is available here.

  1.  Protein Particulates: Another Generic Form of Protein Aggregation? pdf
  2. Life on the edge: a link between gene expression levels and aggregation rates of human proteins. pdf
  3. AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides. pdf
  4. Global analysis of protein expression in yeast. Nature 425(6959):737-41
  5. Protein aggregation during overexpression limited by peptide extensions with large net negative charge pdf
  6. Aggregation Propensity of the Human Proteome pdf

  7. Organism complexity anti-correlates with proteomic B-aggregation propensity pdf
  8. Heat stability of prion rods and recombinant prion protein in water, lipid and lipid–water mixtures

  9. Zaman et al. (2005) Computational Model fo Cell Migration in Three-Dimensional Matrices. Biophysical Journal 89:1389-1397
  10. Lee et al. (2003) The Immunological Synapse Balances T Cell Receptor Signaling and Degradation. Science 302:1218-1222
  11. Auer et al. (2008) A Generic Mechanism of Emergence of Amyloid Protofilaments from Disordered Oligomeric Aggregates. PLoS Computational Biology 4(11):e1000222

Aggregation Background

 David Goodsell's Artistic Rendering of A Eukaryotic Cell. Cellular Components are to Scale and Concentration. 

Monte Carlo Simulation of Diffusion in Budding Yeast

Figure 1 Graphic Representation of Our Monte Carlo Simulation

Diffusion Rate

Figure 2 The effects of Diffusion Rate on cluster formation.

Protein Size

Figure 3 Effects of Protein Size on Cluster Formation

Binding Rate Constant

Figure 4 Effects of the Binding Rate Constant on Cluster Formation

Unbinding Rate Constant

Figure 5 Effects of the Unbinding Rate Constant on Cluster Formation

Protein Concentration

Figure 6 Effects of Protein Concentration on Cluster Formation

GLN1 vs GLT1

Figure 7 Comparison of GLN1 and GLT1 Cluster Formation

Long GLN1

Figure 8 GLN1 Cluster Formation Equilibrium on a Long time Scale

Subpages (1): PERL Code