Central Dogma of Biology

16_Lecture_Presentation
chapter17
18_Lecture_Presentation

RNA to proteins

The genetic code is frequently referred to as a "blueprint" because it contains the instructions a cell requires in order to sustain itself. We now know that there is more to these instructions than simply the sequence of letters in the nucleotide code, however. For example, vast amounts of evidence demonstrate that this code is the basis for the production of various molecules, including RNA and protein. Research has also shown that the instructions stored within DNA are "read" in two steps: transcription and translation. In transcription, a portion of the double-stranded DNA template gives rise to a single-stranded RNA molecule. In some cases, the RNA molecule itself is a "finished product" that serves some important function within the cell. Often, however, transcription of an RNA molecule is followed by a translation step, which ultimately results in the production of a protein molecule. 

Vizualizing Transcription

The process of transcription can be visualized by electron microscopy (Figure 1); in fact, it was first observed using this method in 1970. In these early electron micrographs, the DNA molecules appear as "trunks," with many RNA "branches" extending out from them. When DNAse and RNAse (enzymes that degrade DNA and RNA, respectively) were added to the molecules, the application of DNAse eliminated the trunk structures, while the use of RNAse wiped out the branches. DNA is double-stranded, but only one strand serves as a template for transcription at any given time; the other strand is referred to as the noncoding strand. In most organisms, the strand of DNA that serves as the coding template for one gene may be noncoding for other genes within the samechromosome. Once it was determined that messenger RNA (mRNA) serves as a copy of chromosomal DNA and specifies the sequence of amino acids in proteins, the question of how this process is actually carried out naturally followed. It had long been known that only 20 amino acids occur in naturally derived proteins. It was also known that there are only four nucleotides in mRNA: adenine (A), uracil (U), guanine (G), and cytosine (C). Thus, 20 amino acids are coded by only four unique bases in mRNA, but just how is this coding achieved? The Transcription Process The process of transcription begins when an enzyme called RNA polymerase (RNA pol) attaches to the template DNA strand and begins to catalyze production of complementary RNA. Polymerases are large enzymes composed of approximately a dozen subunits, and when active on DNA, they are also typically complexed with other factors. In many cases, these factors signal which gene is to be transcribed. Three different types of RNA polymerase exist in eukaryotic cells, whereas bacteria have only one. In eukaryotes, RNA pol I transcribes the genes that encode most of the ribosomal RNAs (rRNAs), and RNA pol III transcribes the genes for one small rRNA, plus the transfer RNAs that play a key role in the translation process, as well as other small regulatory RNA molecules. Thus, it is RNA pol II that transcribes the messenger RNAs, which serve as the templates for production of protein molecules.


Chap7_RNA to protein

Transcription Initiation

The first step in transcription is initiation, when the RNA pol binds to the DNA upstream (5′) of the gene at a specialized sequence called a promoter. In bacteria, promoters are usually composed of three sequence elements, whereas in eukaryotes, there are as many as seven elements.

In prokaryotes, most genes have a sequence called the Pribnow box, with theconsensus sequence TATAAT positioned about ten base pairs away from the site that serves as the location of transcription initiation. Not all Pribnow boxes have this exact nucleotide sequence; these nucleotides are simply the most common ones found at each site. Although substitutions do occur, each box nonetheless resembles this consensus fairly closely. Many genes also have the consensus sequence TTGCCA at a position 35 bases upstream of the start site, and some have what is called an upstream element, which is an A-T rich region 40 to 60 nucleotides upstream that enhances the rate of transcription (Figure 2). In any case, upon binding, the RNA pol "core enzyme" binds to another subunit called the sigma subunit to form a holoezyme capable of unwinding the DNA double helix in order to facilitate access to the gene. The sigma subunit conveys promoter specificity to RNA polymerase; that is, it is responsible for telling RNA polymerase where to bind. There are a number of different sigma subunits that bind to different promoters and therefore assist in turning genes on and off as conditions change.

Eukaryotic promoters are more complex than their prokaryotic counterparts, in part because eukaryotes have the aforementioned three classes of RNA polymerase that transcribe different sets of genes. Many eukaryotic genes also possess enhancer sequences, which can be found at considerable distances from the genes they affect. Enhancer sequences control gene activation by binding with activator proteins and altering the 3-D structure of the DNA to help "attract" RNA pol II, thus regulating transcription. Because eukaryotic DNA is tightly packaged as chromatin, transcription also requires a number of specialized proteins that help make the coding strand accessible.

In eukaryotes, the "core" promoter for a gene transcribed by pol II is most often found immediately upstream (5′) of the start site of the gene. Most pol II genes have a TATA box (consensus sequence TATTAA) 25 to 35 bases upstream of the initiation site, which affects the transcription rate and determines location of the start site. Eukaryotic RNA polymerases use a number of essential cofactors (collectively called general transcription factors), and one of these, TFIID, recognizes the TATA box and ensures that the correct start site is used. Another cofactor, TFIIB, recognizes a different common consensus sequence, G/C G/C G/C G C C C, approximately 38 to 32 bases upstream .

The terms "strong" and "weak" are often used to describe promoters and enhancers, according to their effects on transcription rates and thereby on gene expression. Alteration of promoter strength can have deleterious effects upon a cell, often resulting in disease. For example, some tumor-promoting viruses transform healthy cells by inserting strong promoters in the vicinity of growth-stimulating genes, while translocations in some cancer cells place genes that should be "turned off" in the proximity of strong promoters or enhancers.

Enhancer sequences do what their name suggests: They act to enhance the promoter. The proteins that facilitate this looping are called activators, while those that inhibit it are called repressors.ce the rate at which genes are transcribed, and their effects can be quite powerful. Enhancers can be thousands of nucleotides away from the promoters with which they interact, but they are brought into proximity by the looping of DNA. This looping is the result of interactions between the proteins bound to the enhancer and those bound to t

Transcription of eukaryotic genes by polymerases I and III is initiated in a similar manner, but the promoter sequences and transcriptional activator proteins vary.

Strand Elongation

Once transcription is initiated, the DNA double helix unwinds and RNA polymerase reads the template strand, adding nucleotides to the 3′ end of the growing chain. At a temperature of 37 degrees Celsius, new nucleotides are added at the rate of about 15-20 amino acids per second in bacteria (Dennis & Bremer, 1974), while eukaryotes proceed at a much slower pace of approximately five to eight amino acids per second (Izban & Luse, 1992).

Transcription Termination

Terminator sequences are found close to the ends of coding sequences. Bacteria possess two types of these sequences. In rho-independent terminators, inverted repeat sequences are transcribed; they can then fold back on themselves in hairpin loops, causing RNA pol to pause and resulting in release of the transcript. On the other hand, rho-dependent terminators make use of a factor called rho, which actively unwinds the DNA-RNA hybrid formed during transcription, thereby releasing the newly synthesized RNA .

In eukaryotes, termination of transcription occurs by different processes, depending upon the exact polymerase utilized. For pol I genes, transcription is stopped using a termination factor, through a mechanism similar to rho-dependent termination in bacteria. Transcription of pol III genes ends after transcribing a termination sequence that includes a polyuracil stretch, by a mechanism resembling rho-independent prokaryotic termination. Termination of pol II transcripts, however, is more complex.

Transcription of pol II genes can continue for hundreds or even thousands of nucleotides beyond the end of a coding sequence. The RNA strand is then cleaved by a complex that appears to associate with the polymerase. Cleavage seems to be coupled with termination of transcription and occurs at a consensus sequence. Mature pol II mRNAs are polyadenylated at the 3′-end, resulting in a poly(A) tail; this process follows cleavage and is also coordinated with termination.

Both polyadenylation and termination make use of the same consensus sequence, and the interdependence of the processes was demonstrated in the late 1980s by work from several groups. One group of scientists working with mouse globin genes showed that introducing mutations into the consensus sequence AATAAA, known to be necessary for poly(A) addition, inhibited both polyadenylation and transcription termination. They measured the extent of termination by hybridizing transcripts with the different poly(A) consensus sequence mutants with wild-type transcripts, and they were able to see a decrease in the signal ofhybridization, suggesting that proper termination was inhibited. They therefore concluded that polyadenylation was necessary for termination (Logan et. al., 1987). Another group obtained similar results using a monkey viral system, SV40 (simian virus 40). They introduced mutations into a poly(A) site, which caused mRNAs to accumulate to levels far above wild type (Connelly & Manley, 1988).

The exact relationship between cleavage and termination remains to be determined. One model supposes that cleavage itself triggers termination; another proposes that polymerase activity is affected when passing through the consensus sequence at the cleavage site, perhaps through changes in associated transcriptional activation factors. Thus, research in the area of prokaryotic and eukaryotic transcription is still focused on unraveling the molecular details of this complex process, data that will allow us to better understand how genes are transcribed and silenced.

The Codon

The discordance between the number of nucleic acid bases and the number of amino acids immediately eliminates the possibility of a code of one base per amino acid. In fact, even two nucleotides per amino acid (a doublet code) could not account for 20 amino acids (with four bases and a doublet code, there would only be 16 possible combinations [42 = 16]). Thus, the smallest combination of four bases that could encode all 20 amino acids would be a triplet code. However, a triplet code produces 64 (43 = 64) possible combinations, or codons. Thus, a triplet code introduces the problem of there being more than three times the number of codons than amino acids. Either these "extra" codons produce redundancy, with multiple codons encoding the same amino acid, or there must instead be numerous dead-end codons that are not linked to any amino acid.

Preliminary evidence indicating that the genetic code was indeed a triplet code came from an experiment by Francis Crick and Sydney Brenner (1961). This experiment examined the effect of frameshift mutations on protein synthesis. Frameshift mutations are much more disruptive to the genetic code than simple base substitutions, because they involve a base insertion or deletion, thus changing the number of bases and their positions in a gene. For example, the mutagen proflavine causes frameshift mutations by inserting itself between DNA bases. The presence of proflavine in a DNA molecule thus interferes with the molecule's replication such that the resultant DNA copy has a base inserted or deleted.

Crick and Brenner showed that proflavine-mutated bacteriophages (viruses that infect bacteria) with single-base insertion or deletion mutations did not produce functional copies of the protein encoded by the mutated gene. The production of defective proteins under these circumstances can be attributed to misdirected translation. Mutant proteins with two- or four-nucleotide insertions or deletions were also nonfunctional. However, some mutant strains became functional again when they accumulated a total of three extra nucleotides or when they were missing three nucleotides. This rescue effect provided compelling evidence that the genetic code for one amino acid is indeed a three-base, or triplet, code.

Decoding the Genetic Code

Once the budding molecular biology community was convinced about the triplet code, the race to decode which triplets specified which amino acids began. The simplest way to decipher the code would be to start with an mRNA molecule of known sequence, use it to direct the synthesis of a protein, and then determine the amino acid sequence of the synthesized protein. Then, comparison of the original mRNA sequence with the amino acid sequence of the synthesized protein could provide a means for directly decoding the genetic code.

However, at the time when this decoding project was conducted, researchers did not yet have the benefit of modern sequencing techniques. To circumvent this challenge, Marshall W. Nirenberg and Heinrich J. Matthaei (1962) made their own simple, artificial mRNA and identified the polypeptide product that was encoded by it. To do this, they used the enzyme polynucleotide phosphorylase, which randomly joins together any RNA nucleotides that it finds. Nirenberg and Matthaei began with the simplest codes possible. Specifically, they added polynucleotide phosphorylase to a solution of pure uracil (U), such that the enzyme would generate RNA molecules consisting entirely of a sequence of U's; these molecules were known as poly(U) RNAs. Each poly(U) RNA thus contained a pure series of UUU codons, assuming a triplet code. These poly(U) RNAs were added to 20 tubes containing components for protein synthesis (ribosomes, activating enzymes, tRNAs, and other factors). Each tube contained one of the 20 amino acids, which were radioactively labeled. Of the 20 tubes, 19 failed to yield a radioactive polypeptide product. Only one tube, the one that had been loaded with the labeled amino acid phenylalanine, yielded a product. Nirenberg and Matthaei had therefore found that the UUU codon could be translated into the amino acid phenylalanine. Similar experiments using poly(C) and poly(A) RNAs showed that proline was encoded by the CCC codon, and lysine by the AAA codon.

In further experiments to decode the other codons, Nirenberg and his colleagues made artificial RNAs containing defined proportions of two or three different bases. As previously mentioned, polynucleotide phosphorylase joins nucleotides randomly; as a result, these artificial RNAs contained random mixtures of the bases in proportion to the amounts of bases mixed. Hence, the resulting products provided clues that the researchers could use to deduce potential codon–amino acid relationships.

For example, when A and C were mixed with polynucleotide phosphorylase, the resulting RNA molecules contained eight different triplet codons: AAA, AAC, ACC, ACA, CAA, CCA, CAC, and CCC. These eight random poly(AC) RNAs produced proteins containing only six amino acids: asparagine, glutamine, histidine, lysine, proline, and threonine. Remember that previous experiments had already revealed that CCC and AAA code for proline and lysine, respectively. Thus, the four newly incorporated amino acids could only be encoded by AAC, ACC, ACA, CAA, CCA, and/or CAC. With the random sequence approach, the decoding endeavor was almost completed, but some work remained to be done.

Thus, in 1965, H. Gobind Khorana and his colleagues used another method to further crack the genetic code. These researchers had the insight to employ chemically synthesized RNA molecules of known repeating sequences rather than random sequences. For example, an artificial mRNA of alternating guanine and uracil nucleotides (GUGUGUGUGUGU) should be read in translation as two alternating codons, GUG and UGU, thus encoding a protein of two alternating amino acids. Translation of the artificial GUGU mRNA yielded a protein of alternating cysteine and valine residues. However, this technique alone could not determine whether GUG or UGU encoded cysteine, for example.

Next, Nirenberg and Philip Leder developed a technique using ribosome-bound transfer RNAs (tRNAs). They showed that a short mRNA sequence—even a single codon (three bases)—could still bind to a ribosome, even if this short sequence was incapable of directing protein synthesis. The ribosome-bound codon could then base pair with a particular tRNA that carried the amino acid specified by the codon.

Nirenberg and Leder thus synthesized many short mRNAs with known codons. They then added the mRNAs one by one to a mix of ribosomes and aminoacyl-tRNAs with one amino acid radioactively labeled. For each, they determined whether the aminoacyl-tRNA was bound to the short mRNA-like sequence and ribosome (the rest passed through the filter), providing conclusive demonstrations of the particular aminoacyl-tRNA that bound to each mRNA codon.

Degeneracy of the Amino Acid Code

Examination of the full table of codons enables one to immediately determine whether the "extra" codons are associated with redundancy or dead-end codes (Figure 3). Note that both possibilities occur in the code. There are only a few instances in which one codon codes for one amino acid, such as the codon for tryptophan. Note also that the codon for the amino acid methionine (AUG) acts as the start signal for protein synthesis in an mRNA. Moreover, the genetic code also includes stop codons, which do not code for any amino acid. The stop codons serve as termination signals for translation. When a ribosome reaches a stop codon, translation stops, and the polypeptide is released.

Once scientists determined that messenger RNA (mRNA) served as a copy of each gene's DNA and specified the sequence of amino acids in proteins, they immediately had many more questions about the process of protein formation. Specifically, these researchers knew that proteins are made from 20 different amino acids. Moreover, they also knew that there were only four nucleotides in mRNA: adenine (A), cytosine (C), guanine (G), and uracil (U). But how exactly could these four nucleotides code for all 20 amino acids? The answer to this question turned out to be simpler than one might expect.

Determining the Number of Nucleotides Per Amino Acid

Right away, researchers knew that the genetic code was more complex than one nucleotide peramino acid. After all, if this was the case, a person's DNA could only code for four different amino acids. In fact, even two nucleotides per amino acid (i.e., a doublet code) could not account for 20 amino acids, because such a code provides only 16 permutations (four bases at each of two positions = 4 × 4 = 16 amino acids).

Thus, early researchers quickly determined that the smallest combination of As, Cs, Gs, and Us that could encode all 20 amino acids in RNA would be a triplet (three-base) code. A triplet combination, or codon, would allow for 64 possible combinations (four bases at each of three positions = 4 × 4 × 4 = 64). However, with only 20 amino acids, a triplet code would also suggest redundancy–in other words, more than one codon might correspond to the same amino acid, or there might even be "spare" or unused codons. If such "spare" codons were present, what was their purpose? Did they serve to "break up" the code, much like commas in a sentence? Furthermore, how could a three-nucleotide code be "read" by the protein-forming machinery of the ribosome? Was it an overlapping or non-overlapping code? Was it a continuous code, or were there "commas" (spare nucleotides) between codons that served as signals for the next amino acid? These questions were answered by way of several elegant experiments.

Ruling Out Overlaps

In their investigation of the exact nature of the genetic code, scientists first turned to the question of possible overlaps. Specifically, researchers Akira Tsugita and Heinz Fraenkel-Conrat (1960) proposed that if the code were overlapping, a mutation (or change) in one nucleotide would cause changes in more than one amino acid in the resulting protein. Fortunately, recent technological advancements had made it possible for Tsugita and Fraenkel-Conrat to determine the amino acid sequence in short proteins. Thus, by comparing protein sequences made from both nonmutated and mutated DNA, they were able to resolve this issue. First, the research team treated tobacco mosaic virus DNA with nitrous acid, leading to a point mutation in the DNA sequence. Then, they compared the protein produced by the mutated DNA with that produced by the "normal" viral DNA. Strikingly, the amino acid sequence of the "mutant" protein contained a change in only one amino acid, strongly suggesting use of a non-overlapping code.

Determining Codon Length

However, Tsugita and Fraenkel-Conrat's findings alone did not resolve whether the genetic code was read in sets of three nucleotides or perhaps more. This issue was addressed by a separate research team consisting of Francis Crick, Leslie Barnett, Sydney Brenner, and Richard Watts-Tobin. In 1961, this group provided the first evidence for a triplet code by way of experiments using the T4 bacteriophage (a bacteria-specific virus).

In particular, these researchers devised a clever assay that enabled them to deduce the properties of the genetic code following introduction of a special kind of mutation, known as a frameshift mutation. A frameshift mutation is caused by either the addition or the deletionof a base in the original DNA sequence, which in turn causes the protein-forming machinery to shift positions (or reading frames) on the RNA. Such a frameshift alters codon groupings, and thus the corresponding protein is made with incorrect amino acids from the point of the mutation onward

In their work, the research team first introduced a single frameshift mutation into a viral protein involved in the infection of E. coli bacteria. (Bacterial infection was the readout in this experiment.) This addition of a lone frameshift mutation rendered the resulting protein ineffective. The researchers then introduced additional frameshift mutations in the hope that doing so would restore the correct reading frame (and, in turn, allow the protein to once again play a role in the infection of E. coli). The experiment worked! For example, when the first mutation added a base (+), a later suppressor mutation (-), which deleted a base, was able to put the code back on track.

Interestingly, the team noted that the introduction of three separate frameshift mutations that each added a base (+ + +) to the same DNA were also sometimes (when they were close together) able put the code back on track. Similarly, three mutations that deleted a base (- - -) could also rescue protein function and infectivity. Therefore, the code was only thrown off by nontriplet changes. This finding strongly supported the existence of a triplet code, or at least a code written in multiples of three bases. Thus, when Crick and his colleagues analyzed their results, they were the first people to see that the genetic code was based on multiples of three bases!