Methods

Figure 2. Overview of the scripts use to weave together the stochastic translation simulator with our model of accelerated evolution.

2.0 How did we build the model?

The codon evolution scripts (available at https://github.com/ajlukasiewicz/codon_evolution) were developed with a modular programming application in mind. 5 core components, transcripts.py, evaluate.py, fit_eval.py, pooled_transcripts.py, and pooled_evolution.py all work in tandem to model translation and to move transcripts through the evolutionary process.

First, the transcript class object is created based on either random or defined distributions of two codon types, fast and slow. This transcript is mutated and the two lists are passed on to Pinetree for stochastic simulation of translation. Then, the tabular output of Pinetree is imported into the evaluation script (section 2.1), which calculates a fitness values for that feature. Following that step, the fitness of each transcript is evaluated using the origin fixation model, described in section 2.2.

Figure 3. Fitness for each transcript was calculated based on the protein production rate after simulated ribosomes had reached steady state. A) Steady state was determined by counting the number of free ribosomes per time point, meaning that the number on and off the transcript had reached equilibrium .

2.1 Pinetree simulation

Pinetree (Jack and Wilke, 2018) is a recently developed Python tool for simulation of transcription and translation using the Gillespie Algorithm, which is a stochastic model commonly used to describe chemical and biological processes. Genome and Transcriptome objects are supported by the program, and for our purposes only transcript level objects were simulated. Pinetree produces a tab-separated table of counts for each feature in the simulation (i.e. free ribosomes, transcripts, and proteins produced) which we used to 1) determine the steady state of the system, and 2) calculate protein production rate as a proxy for fitness (figure 3). Once the slope of ribosome counts reached 0, the slope of the line for proteins A and B were calculated. These values continue on to be evaluated by the main evolution script to be accepted or rejected for the next generation.

Figure 4. Origin fixation calculation used to evaluate fitness from the original (xi) and mutant (xj) transcript simulations. N = population size. Image from Teufel and Wilke 2017.

2.2 Origin Fixation Model

To simulate an accelerated model of evolution in our script, we used the computationally efficient accelerated evolution model proposed by Teufel and Wilke, 2017. Within each generation, there are two possible populations, the origin (xi) and the mutant(xj). The model accepts or rejects the mutant as the progenitor for the next generation based on its calculated fitness (xj) and some random element of chance (p = random number between 0 and 1).

2.4 Evolutionary Model Validation

To validate our model was capturing biologically relevant output, and not performing in error we performed a series of tests to determine if evolution was occurring as expected. First, we tested an array of codon speeds to see if production rates increased as expected. Then, instead of initializing the simulation with a randomized transcript, 5 skewed distributions of founder transcripts were tested to observe whether starting composition has an effect on the evolution process. Finally, we tested the changes in protein production rates over multiple simulations to see if the model is selecting for increasing fitness.

2.3 Usage

The codon evolution suite of scripts can be used easily from the command line. Included are several arguments you can pass to the program:

-g     number of generations to evolve for (required)

-o      name of output folder (required)

-s     slow codon rate (default = 0.5)

-f      fast codon rate (default = 1.0)

-r      number of ribosomes in simulation (default = 5)

-sp    baseline speed of translation (default = 30)