DNA is split into different genes. The genes are little sections of the DNA that each code for a particular protein. The process of using the code found in genes to make useful proteins is known as gene expression. However, between these crucial genes are HUGE sections of bases that do not code for proteins and are sometimes useless. Other times, they are regulatory sections, which we will explore more later.
"If the genome is thought of as a recipe, then genes code for the ingredients and the switches contain the instructions about when and where to add each ingredient. If 2 percent of the genome is made up of genes that make proteins, then part of that other 98 percent contains the information that tells genes when and where to be active." (Some Assembly Required: Decoding Four Billion Years of Life, from Ancient Fossils to DNA by Neil Shubin)
Gene expression is the process by which DNA is actually made useful. Up until now, we have known that DNA is crucial to life, but we have not explored in what way. Gene expression is when the instructions held within DNA are used to actually do something, to create something. Those genes, those instructions, are going to be expressed and made real.
This process of gene expression is known as the central dogma of biology. Basically, this process is how every living thing becomes more than just a recipe (DNA). A recipe book is useless if you don't actually cook anything. Thus, DNA is the recipe to the phenotype that makes you....... You. The central dogma can be simplified with three simple words and two arrows:
DNA --> RNA --> Protein
Now in reality, of course, there are vocabulary terms and processes to understand within those arrows. The first arrow, which takes the instructions from DNA and transfers it into the form of RNA is the process known as transcription. The second arrow, which takes the instructions from RNA and transforms it into useful proteins, is known as translation. The above diagram also includes replication just so that you do not forget that DNA must be copied throughout a cell's life cycle!
This diagram represents a clean overview of gene expression from DNA to RNA to protein and includes the relative location of each process. Please note that the arrows do not represent changing the DNA into RNA. The information found therein is simply transferred.
The DNA is still intact, the mRNA is not destroyed when the protein is made. This will make more sense as we delve into the individual processes of transcription and translation. I just wanted to highlight the fact that DNA is not destroyed, as that can often be misinterpreted in the language we use and the diagrams showing these processes.
Transcription is represented by arrow #1. The instructions in the DNA (the blue double helix, of course) are transferred into RNA (the red strand). This occurs in the nucleus, of course. We do not want to remove our DNA from our nucleus and make it vulnerable.
Arrow #2 represents translation, the conversion of the RNA instructions into a useful protein. By step 3, we have a fully formed polypeptide chain.
Basic Principles of Transcription and. Translation
RNA is the bridge between genes and the proteins for which they code
Transcription is the synthesis of RNA using the information in DNA
Transcription produces messenger RNA (mRNA)
Translation is the synthesis of a polypeptide, using the information in the mRNA
Overview of Transcription
Remember that transcription is the process by which DNA's information is transferred into the form of mRNA, one of the many types of RNA. In this process, one of the two DNA strands is used as a template strand (much like we saw in DNA replication).
The mRNA strand that is synthesized by RNA polymerase is complementary to the template strand. So if the template strand had an A, the mRNA strand will have a U in that spot. Recall that RNA cannot have thymine, T, and instead has uracil, U.
Codons: Triplets of Nucleotides
The flow of information from gene to protein is based on a triplet code: a series of non- over lapping, three-nucleotide words
The words of a gene are transcribed into complementary non -overlapping three-nucleotide words of mRNA
These words are then translated into a chain of amino acids, forming a polypeptide
Synthesis of an RNA Transcript
The three stages of transcription
Initiation
Elongation
Termination
RNA Polymerase Binding and Initiation of Transcription
Promoters signal the transcriptional start point and usually, extend several dozen nucleotide pairs upstream of the start point.
Transcription factors mediate the binding of RNA polymerase and the initiation of transcription
The completed assembly of transcription factors and RNA polymerase II bound to a promoter is called a transcription initiation complex
A promoter called a TATA box is crucial in forming the initiation complex in eukaryotes
As mentioned above, the mRNA strand is complementary to the template strand. So, if given a DNA template strand, you need to be able to name the mRNA sequence (including the 5' --> 3' direction) that will be its complement. Test yourself using this image of the process.
This new mRNA strand will undergo some changes before it is truly ready for translation. Thus, it is sometimes referred to as
pre-mRNA.
Part1: Get a CAP and Tail
Each end of a pre-mRNA molecule is modified in a particular way
The 5′ end receives a modified nucleotide 5′ cap
The 3′ end gets a poly-A tail
These modifications share several functions:
They seem to facilitate the export of mRNA to the cytoplasm
They protect mRNA from hydrolytic enzymes
They help ribosomes attach to the 5′ end
Part 2- RNA SPLICING
Most eukaryotic genes and their RNA transcripts have long noncoding stretches of nucleotides that lie between coding regions.
These non-coding regions are called intervening sequences, or introns.
The other regions are called exons because they are eventually expressed, usually translated into amino acid sequences
RNA splicing removes introns and joins exons, creating an mRNA molecule with a continuous coding sequence
SPLICESOSOMES
In some cases, RNA splicing is carried out by spliceosomes
Spliceosomes consist of a variety of proteins and several small nuclear ribonucleoproteins (snRNPs)
that recognize the splice sites. The RNAs of the spliceosome also catalyze the
splicing reaction
ALTERNATIVE RNA SPLICING
Some introns contain sequences that may regulate gene expression
Some genes can encode more than one kind of polypeptide, depending on which segments are treated as Exons during splicing. This is called alternative RNA splicing
Consequently, the number of different proteins an organism can produce is much greater than ts number of genes
TRANSLATION
Now that we have a complete mRNA strand, that information can be used to finally synthesize the proteins for which it codes! The process by which mRNA's information is used to form a polypeptide (protein) is known as translation.
Recall that ribosomes are the organelles responsible for protein synthesis. It should come as no surprise, then, that ribosomes will play a central role in translation. The ribosome is made up of two major subunits: a large ribosomal subunit and a small ribosomal subunit.
These subunits are made up primarily of rRNA, another type of RNA to join mRNA and the soon-to-be-described tRNA.
Just like with the DNA during transcription, the mRNA is not destroyed during translation. Its information is simply used up. When the mRNA strand is 'finished' with its job, it is destroyed by digestive enzymes.
In fact, sometimes mRNA strands are used multiple times by multiple ribosomes to make more than just the one copy of the polypeptide for which it codes.
The mRNA strand is a string of nucleotides, of course. However, these nucleotides are not read one-by-one. The ribosome 'reads' the mRNA strand in groups of 3 nucleotides known as codons. Every possible sequence of mRNA nucleotides codes for an amino acid.
Start is Always AUG (Methionine)
Stop UAA, UGA or UAG
In this diagram, you can see mRNA represented by the pink string of nucleotides. You can also see the ribosome reading the mRNA strand. The ribosome has three 'windows' through which a new molecule, tRNA, can enter. The blue structure is a tRNA molecule, or transfer RNA molecule. This is the taxi that takes amino acids to the ribosome. How does the tRNA 'know' where to go, then? Well, tRNA has an anti-codon at the bottom that will bind to it complementary codon on the mRNA strand. Thus, tRNA will bring the correct amino acid to be added to the polypeptide.
The codon near the middle of the strand labeled AUG is an important one because that is what is known as the start codon. Every polypeptide made in a ribosome begins with methionine, the amino acid coded for by the AUG start codon.
There are three stop codons, that, when reached, cause the ribosome to disassemble, releasing the mRNA. These stop codons follow the last amino acid added to a polypeptide during the translation. If a stop codon is every 'accidentally' added too early (via a mutation), then the protein will not be synthesized properly.
Some proteins need to be modified after translation with some post-translational modifications. This allows the polypeptide to attain the proper three-dimensional structure for the protein to do its job properly.
If given an mRNA strand (or given the DNA template strand), you will need to be able to use a codon chart. Using it is quite simple, luckily! Simply plug in the codon to get the amino acid that corresponds to that codon.
For instance, using this codon chart (the most common kind you'll encounter) let's plug in our start codon, AUG:
A is the first base, so we are going to be in the third row.
U is the second base, so we are in the first column.
G is the third base, so we can see that the amino acid represented by AUG is methionine, or 'Met'.
You don't need to memorize the long forms of any amino acids - the shortened versions will do unless the full amino acid name is provided.
Recall that a mutation is any change in the DNA. There are many kinds of mutations both at the nucleotide and chromosomal levels, but we will focus on the distinctions between point and frameshift mutations.
In a silent mutation, the DNA changed but there was no discernible effect on the protein (see image). This occurs because both mRNA codons that are transcribed code for the same amino acid.
In a nonsense mutation, the polypeptide is shortened abruptly by the new stop codon due to the mutation. This almost always results in a non-functioning protein.
In a missense mutation, there is a change to the amino acid that is added to the polypeptide chain. This will generally affect the protein's structure, but the protein may still function depending on the chemical properties and placement of that amino acid. You do NOT need to know the differences between conservative and non-conservative missense mutations. The classic example we will explore in class of this mutation is sickle cell anemia.
"Sickle cell anemia, in its most extreme cases, can be fatal by the age of three in almost 70 percent of sufferers. And what is the difference between a healthy red blood protein and a sickle cell one? Only a single amino acid in the string: the amino acid glutamate is replaced by one called valine in the sixth position in the sequence. A tiny difference in the amino acid sequence can have massive ramifications on the protein, the cells in which the protein is found, and the lives of the individuals who have those cells." (Some Assembly Required: Decoding Four Billion Years of Life, from Ancient Fossils to DNA by Neil Shubin)
As you can see, a single base pair substitution can have a variety of effects. Often it will do nothing. Sometimes it will change the protein. Usually that change is not good for the organism or the cell, but occasionally it is VERY good. This will become more commonly discussed in the next unit on natural selection.
Frameshift mutations are much more severe than point mutations. Rather than changing a single nucleotide, these mutations cause an entire shift in the reading frame. This sounds pretty complicated, but it makes sense if you think of the mRNA as being in sets of codons.
If you mess up one codon by removing a nucleotide, for instance, it will mess up every subsequent codon that is 'downstream' of that mutation. So if a frameshift mutation occurs at the beginning of a gene, it will affect nearly the entire protein!
Let's explore the two ways by which frameshift mutations may occur: deletions and insertions.
Insertion mutation, a base or multiple bases are added to the DNA sequence where before there was none. This causes all subsequent codons to be shifted because the ribosome will simply read the mRNA 3 bases at a time after the start codon.
So if a base got in that 'shouldn't' be there, it will affect every subsequent codon. This 'frame shift' is a result of the fact that every (or nearly every) subsequent codon consists of a different set of 3 nucleotides than it did before the DNA alteration.
Deletion mutation works in basically the same way but in reverse. So instead of bases being added, they are removed. This also causes the 'frame shift' because every subsequent codon now contains different nucleotides than it did before the DNA sequence changed.
Ironically, often it is less disruptive to the protein that is created if 3 nucleotides are removed (or added) rather than just one. This causes the subsequent codons to be correct, but for there to be one less (or one extra if it is an insertion) amino acid in the polypeptide.