Optimizing and Modeling Protein Expressions in a Cell-Free System

Alandra McDowell & Aset Khakimzhan

University of Minnesota - Twin Cities, School of Physics and Astronomy

Noireaux Lab

Introduction:

Cell-free transcription translation (TXTL) is of great interest in a wide range of biotechnological, biophysical, and medical research.1 TXTL is useful because of the speed that it offers for designing and interrogating complex biological systems. TXTL applications are networks of protein producing biochemical reactions1. For larger networks to operate as intended, more proteins are required; thus, optimizing a system’s protein expression is of great importance.

Cell-free, or in vitro, protein expression refers to protein synthesis that occurs within a cell lysate rather than within a cell. In order to perform in-vitro experiments, the cell must be disbanded through rupture of the cell’s membrane, a process known as lysing. The contents of the cell are then contained in a fluid. This fluid, known as cell lysate, can be used in conjunction with plasmid and a reaction mixture to produce recombinant proteins, as will be described herein. Protein expression of this type is preferable over in vivo techniques, or processes taking place within a cell, because it allows researchers to express and manufacture proteins at a faster pace.4

In the past, TXTL the reactions had been conducted “by hand”, meaning that researchers would use pipetors to dispense the components of the reaction. This required them to make 5-10ul reactions and spend approximately 5 minutes on each reaction. We programmed our high-throughput liquid dispensing robot to construct 96 reactions with a volume of 1ul, which could be done every 30 minutes. We took advantage of the efficiency of the liquid handler and made many reactions that spanned significant portions of the component space.

The components were dispensed by the robot into 96 well plates, meaning each plate contained 96 reactions. The DNA that was inputted into the reaction was programmed to express Green Fluorescent Proteins, allowing the yield of expression to be quantified by the intensity of the green light emitted by each reaction. With the collected data we constructed models of final intensities and of the kinetics.

Theory:

TXTL systems isolate the process of protein expression from living systems. In the case of an E.coli based TXTL system, early transcription and translation can be modelled as such14:

Transcription, the first step of a TX-TL reaction, involves converting DNA into messenger RNA (mRNA). The concentration of mRNA present in reaction can be determined by Equation 1, which is represented in red in Figure 1. The second stage of a TX-TL reaction, known as translation, involves converting mRNA into proteins - in this case, Green Fluorescent Protein (GFP). Equation 2, which represents the concentration of protein in a reaction over time, is plotted below in green.

Figure 1: The concentrations of mRNA and expressed proteins present in a TX-TL reaction over time.

Experimental Apparatus and Procedure:

A TX-TL reaction needs extract, amino acids, magnesium glutamate, potassium glutamate, polyethylene glycol, plasmids, and an energy mix. The extract, or lysate, could be described as the “machinery” of the transcription and translation reaction. Amino acids are the building blocks of the Green Fluorescent Proteins (GFPs), which allow the protein expression to be quantified through the intensity of green light emitted. The energy is a complex mix of various components that allow the reaction to have a continuous supply of energy to ensure proper function. Maltodextrin is the reaction’s source of carbon. Potassium glutamate and magnesium glutamate are responsible for creating ionic screens around the DNA and mRNA molecules. Polyethylene glycol crowds the molecules to aid in triggering the reaction. The plasmid is necessary, because it contains the instruction for the genetic expression of the system. Water is also used to keep the total volumes of solution across all data readings constant while the volumes of all other components are varied throughout the experiment.

Extract, amino acids, and energy mix were prepared, along with a stock of of magnesium glutamate, potassium glutamate, polyethylene glycol, maltodextrin, and plasmid. In this project the concentrations of extract, amino acids, energy mix, and magnesium glutamate were varied. The initial generation had ninety-six reactions per plate with ranges of 0 to 32.5L of Magnesium glutamate, 197.5 to 255L of amino acids, and 0 to 65L of energy mix. Initial ranges were chosen based on results of past experiments.1 After the analysis of each generation, the ranges of concentrations of Magnesium glutamate, amino acids, and energy mix were shifted based on which values give the highest yield of protein. Plates with extract levels of fifty, seventy, ninety, one hundred, and one hundred ten percent of the maximum extract level used in previously published experiments were tested.1

An Echo 550 Liquid Handler, which is a liquid dispensing machine, was programmed to dispense the desired volumes of each component from a source plate to a destination plate, as shown in Figure 2. The chemical components were then manually pipetted into a source plate, shown in Figure 3, from which the liquid dispensing machine would take the resources for the destination plate, as shown in Figure 4. One plate was prepared for each of the five different levels of extract. The plates were then incubated for 10-12 hours at a temperature of 29 degrees Celsius.

Figure 2: The Echo 550 Liquid Handler uses sound waves to dispense the desired volume of a particular component from the source plate to the destination plate. Various volumes of each component are dispensed into each cell of the destination plate, producing 96 reactions per generation.

Figure 3: A typical source plate that experimenters would manually pipet components into.

Figure 4: A typical destination plate would contain about 96 reactions.

When the reactions were completed, their protein yield was measured using a spectrophotometer. All of the proteins that were generated were green fluorescent proteins (GFPs), and the intensity of the emitted green fluorescence was linearly proportional to the concentration of the protein.1The data collected from these plates was analyzed to find clues for the next generation of plates. Figure 5 shows an example of the data analysis that was done for a particular plate in order to determine the direction for the next generation of plates that would be tested. The darkness of each square is proportional to the yield of proteins expressed for that combination. For example, in the figure below, one of the highest yields of protein for the fifty percent extract plate illustrated below was the result of 4mM of amino acids, 4mM of magnesium glutamate, and 50x 3PGA. Upon analyzing this data, for example, the experimenters would recognize that the highest yields of protein occurred in the higher magnesium glutamate regions and decide to increase the concentrations of magnesium glutamate used in the next generation of the experiment.

Figure 5: A visual, single-generation representation of reactions with various combinations of component volumes for a given extract level. The highest quantities of expressed proteins are represented by the darkest regions of the graph.

Modeling and Analysis:

We used an OLS linear regression to make a second-order polynomial function that described the protein expression. This model described about 87% of the variance in protein expression amongst reactions. A region of this model is graphed in Figure 6, with extract and amino acid levels being held constant for display purposes. From the model, we extracted dependencies that interested us the most: maximum protein expression as a function of extract volume (Figure 7), the volume of magnesium in these maximas as a function of extract volume (Figure 8), and the volume of the energy mix in these maximas as a function of extract (Figure 9).

Figure 6: A three-dimensional representation of a particular region of the five-dimensional quadratic regression model. Here, extract and amino acid volumes are held constant, although the model accounts for variances in these component volumes.

Figure 7: The maximum protein expression possible as a function of extract levels. It is of interest to note that protein expression does not increase linearly with the addition of more cell extract to the reaction. Here, the blue region represents uncertainty.

Figure 8: The optimal volume of Magnesium-glutamate required to achieve the maximum levels of protein expression for all possible volumes of extract, as determined in Figure 7.

Figure 8: The optimal volume of energy mix required to achieve the maximum levels of protein expression for all possible volumes of extract, as determined in Figure 7.

Optimization:

Since the protein expression function can be described sufficiently well with a convex function, the final intensities of all the reactions were modeled with a gradient boosting regression algorithm. First, all the repeated reactions were averaged, both in final intensity, and kinetics if available. Extract, magnesium, energy mix, and amino acids were set as the independent variables, and intensity was set as the dependent variable.

Gradient boosting works as an ensemble of weak predictions, which when combined form a stronger predicting model. The algorithm defines a loss function, which is used to measure the quality of a given regression. The algorithm adds a single weak prediction, which improves the loss function the best at certain step. We have limited the number of such prediction to a 225. At the end, all the predictions from step 1 to 225 are assigned a weight based on the loss function, and then the weighted average of these predictions becomes the final model.

Using this technique, we constructed a stronger, but less interpretable model with the collected final intensities. The model had an R2 = 0.94, meaning it could account for 94% of the variable around the mean. On average, the larger generation achieved 95% of the global optimum under 8 generations.

Figure 9: Each color represents a different generation size. In the red curve, each generation contained 96 individuals, black – 48 individual, green – 24 individuals, blue – 12 individuals. Each curve is an average of 400 optimization trials.

Conclusion:

Given the information from the analysis, it appears that the method employed herein can be a viable strategy to predict the final intensities of reactions using the initial kinetics and based on the predictions perform initial genetic algorithm iterations. The predictors and algorithms in this work were minimalist, developed with as few lines of code as possible and there is certainly space to better the performance of the software.

Here, the optimization algorithm was running inside a gradient boosting model; thus, its performance was damped by the discrete nature of weak prediction. In real tests, the intensity is described by a continuous function, the algorithm should perform better, since it would notice changes approximately on the level of noise resolution. Also, in a real TXTL reaction, a deterministic maximum is impossible to detect, since much of the internal processes of the reaction are stochastic; thus, an optimum, would be an area, rather than a single point.

Overall, it seems that for simple optimization with 4 parameters, it is very much possible to converge towards the maximum within 1-2 business days. However, we predict that the true power of such techniques will come out in the optimization of systems with far more parameters. Since the kinetics to final intensity model is independent of the components added, its performance would remain stable, as long as there is enough training data. The optimization process would quickly converge towards a narrow vicinity of the optimal values, since it appears that the parameter space of these reactions is virtually convex, due to the additive hormesis effect the chemicals have on the protein expression.

The models constructed through this method of analysis should be helpful in determining the optimal combination of component volumes to maximize the protein expression of TX-TL reactions. The results of this analysis estimate that optimal protein expression can be achieved with at an extract volume of (400.0032.00) nL, a Magnesium-glutamate volume of (0.502.63)nL, and an energy mix volume of (24.794.63)nL.

An improvement on the methods employed herein could minimize the errors associated with these optimal values. As described in the analysis section, the function of protein expression was maximized with respect to only two parameters, Magnesium-glutamate and energy mix, while extract was treated as a known value. The optimal values of Magnesium-glutamate and energy mix for only five different extract levels then formed the basis for models that optimized Magnesium-glutamate and energy mix as a function of extract. This technique is responsible for the large uncertainties in optimal component values at high volumes of extract, such as the optimal value of 400nL. Future research could minimize these uncertainties by maximizing a model for protein expression as a function of three variables instead of two and obtaining exact optimal values for all three of these parameters instead of obtaining only two exact optimal values that are dependent on the third parameter.

Amino acid volume was held constant at 382.5 nL, or 5mM, which was identified from the data presented in Figure 6 to be an optimal value. Because the quadratic regression model of protein expression that was constructed showed a low coefficient and low statistical significance for amino acid, the exact optimization of this parameter was excluded from the analysis for ease of computation. Future research could produce a more accurate result by including this parameter in optimization, along with several other components of the reaction that were not of interest in this particular work. In addition, future research might include the optimization with separated components of the energy and amino acid mixes, the decoding of cytoplasm chemistry, and the development of plant and mammalian TXTL systems.

Acknowledgements:

We would like to thank Professor Vincent Noireaux for making this experiment possible by allowing us the use of his lab facilities and equipment, as well as the guidance and recommendations he provided throughout the project. We also owe thanks to Professor E. Dan Dahlberg for his mentorship, along with the entire Methods of Experimental Physics II faculty.

References:

    1. Garamella, J., Marshall, R., Rustad, M., Noireaux, V. The all E. coli cell-free TX-TL toolbox 2.0: a platform for cell-free synthetic biology. ACS Synthetic Biology 5(4), 344–355 (2016)

    2. Keith Pardee, et al. Rapid, Low-Cost Detection of Zika Virus using Programmable Biomolecular Components. Cell 165(5), 1255-1266 (2017)

    3. Marshall, Ryan. Rapid and Scalable Characterization of CRISPR Technologies Using an E. coli Cell-Free Transcription-Translation System. Molecular Cell 69(1), 146-157 (2018)

    4. Sun, Z.Z., Yeung, E., Hayes, C.A., Noireaux, V., Murray, M.M. Linear DNA for rapid prototyping of synthetic biological circuits in an Escherichia coli based TX-TL cell-free system. ACS Synthetic Biology 3(6), 387-397 (2014)

    5. Whitley, Darrell (1994). "A genetic algorithm tutorial" (PDF). Statistics and Computing. 4 (2): 65–85. doi:10.1007/BF00175354.

    6. Alberts, Bruce, et al. Essential Cell Biology. Garland Science, 2014.

    7. Stogbauer T, Windhager L., Zimmer R., Radler J. (2012) Experiment and Mathematical Modeling of Gene Expression Dynamics in a Cell-Free System. Integrated Biology 4, 494-501

    8. Brown, Tom, and Tom Brown. Nucleic Acids Book. https://www.atdbio.com/nucleic-acids-book.

    9. Mitchell, Melanie (1996). An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press. ISBN 9780585030944

    10. Watson, James; Tania A. Baker; Stephen P. Bell; Alexander Gann; Michael Levine; Richard Losik; Stephen C. Harrison. Molecular Biology of the Gene (7th ed.). Benjamin-Cummings Publishing Company. ISBN 978-0-321-76243-6.

    11. Alberts, Bruce, et al. (2002) Molecular Biology of the cell (4th ed.). New York: Garland Science.

    12. Othmer, Hans (2018). Analysis of Complex Reaction Networks in Signal Transduction, Gene Control and Metabolism. Minnesota: Minneapolis: University of Minnesota

    13. Shieh, Jean, et al. “Precision Nanoliter Aqueous Transfer.” Genetic Engineering & Biotechnology News, vol. 27, no. 9, 1 May 2007.

    14. Karzbrun E., Shin J., Bar-Ziv R.H. Noireaux V. Coarse-Grained Dynamics of Protein Synthesis in a Cell-Free System. Physical Review Letters 106, 048104 (2011)

.