DNA is split into different genes. The genes are little sections of the DNA that each code for a particular protein. The process of using the code found in genes to make useful proteins is known as gene expression. However, between these crucial genes are HUGE sections of bases that do not code for proteins and are sometimes useless. Other times, they are regulatory sections, which we will explore more later.
"If the genome is thought of as a recipe, then genes code for the ingredients, and the switches contain the instructions about when and where to add each ingredient. If 2 percent of the genome is made up of genes that make proteins, then part of that other 98 percent contains the information that tells genes when and where to be active." (Some Assembly Required: Decoding Four Billion Years of Life, from Ancient Fossils to DNA by Neil Shubin)
Gene expression is the process by which DNA is actually made useful. Up until now, we have known that DNA is crucial to life, but we have not explored in what way. Gene expression is when the instructions held within DNA are used to actually do something, to create something. Those genes, those instructions, are going to be expressed and made real.
This process of gene expression is known as the central dogma of biology. Basically, this process is how every living thing becomes more than just a recipe (DNA). A recipe book is useless if you don't actually cook anything. Thus, DNA is the recipe to the phenotype that makes you you. The central dogma can be simplified with three simple words and two arrows:
DNA --> RNA --> Proteins
Now in reality, of course, there are vocabulary terms and processes to understand within those arrows. The first arrow, which takes the instructions from DNA and transfers it into the form of RNA is the process known as transcription. The second arrow, which takes the instructions from RNA and transforms it into useful proteins, is known as translation. The above diagram also includes replication just so that you do not forget that DNA must be copied throughout a cell's life cycle!
This diagram represents a clean overview of gene expression from DNA to RNA to protein and includes relative location of each process. Please note that the arrows do not represent changing the DNA into RNA. The information found therein is simply transferred.
The DNA is still intact, the mRNA is not destroyed when the protein is made. This will make more sense as we delve into the individual processes of transcription and translation. I just wanted to highlight the fact that DNA is not destroyed, as that can often be misinterpreted in the language we use and the diagrams showing these processes.
Transcription is represented by arrow #1. The instructions in the DNA (the blue double helix, of course) are transferred into RNA (the red strand). This occurs in the nucleus, of course. We do not want to remove our DNA from our nucleus and make it vulnerable.
Arrow #2 represents translation, the conversion of the RNA instructions into a useful protein. By step 3, we have a fully formed polypeptide chain.
Remember that transcription is the process by which DNA's information is transferred into the form of mRNA, one of the many types of RNA. In this process, one of the two DNA strands is used as a template strand (much like we saw in DNA replication).
The mRNA strand that is synthesized by RNA polymerase is complementary to the template strand. So if the template strand had an A, the mRNA strand will have a U in that spot. Recall that RNA cannot have thymine, T, and instead has uracil, U.
As mentioned above, the mRNA strand is complementary to the template strand. So, if given a DNA template strand, you need to be able to name the mRNA sequence (including the 5' --> 3' direction) that will be its complement. Test yourself using this image of the process.
This new mRNA strand will undergo some changes before it is truly ready for translation. Thus, it is sometimes referred to as pre-mRNA.
The pre-mRNA, synthesized by RNA polymerase, consists of two different types of sequences: introns and exons. The introns are removed (remember it by thinking of incising, or cutting). The exons are kept on the strand, and so will be expressed after translation. That is another way to keep introns and exons straight - exons are expressed. The removal of introns is referred to as splicing.
A cap is added to the 5' end of the mRNA strand and a string of adenine (A) nucleotides are added to the 3' end, forming a poly-A tail to finalize the sequence as a true mRNA strand.
To better understand the importance of removing those introns in eukaryotes, read through the following except from The Gene: An Intimate History. This except may challenge you a little because it is written at a level somewhere between the rest of this text and your textbook, so give it your absolute best and read through it slowly. It will genuinely help you to understand the importance and utility of gene splicing.
"As an analogy, consider the word structure. In bacteria, the gene is embedded in the genome in precisely that format, structure, with no breaks, stuffers, interpositions, or interruptions. In the human genome, in contrast, the word is interrupted by intermediate stretches of DNA: s...tru...ct...ur...e.
The long stretches of DNA marked by the ellipses (...) do not contain any protein-encoding information. When such an interrupted gene is used to generate a message - i.e., when DNA is used to build RNA - the stuffer fragments are excised from the RNA message, and the RNA is stitched together again with the intervening pieces removed: s...tru...ct...ur...e became simplified to structure. Roberts and Sharp later coined a phrase for the process: gene splicing or RNA splicing (since the RNA message of the gene was 'spliced' to remove the stuffer fragments.
At first, this split structure of genes seemed puzzling: Why would an animal genome waste such long stretches of DNA splitting genes into bits and pieces, only to stitch them back into a continuous message? But the inner logic of split genes soon became evidence: by splitting genes into modules, a cell could generate bewildering combinations of messages out of a single gene. The word s...tru...c...t...ur...e can be spliced to yield cure and true and so forth, thereby creating vast numbers of variant messages - called isoforms - out of a single gene. From g...e...n...om...e you can use splicing to generate gene, gnome, and om. And modular genes also had an evolutionary advantages: the individual modules from different genes could be mixed and matched to build entirely new kinds of genes (c...om...e...t). Wally Gilbert, the Harvard geneticist, created a new word for these modules; he called them exons. The in-between stuffer framents were terms introns." (The Gene: An Intimate History by Siddhartha Mukherjee).
Now that we have a complete mRNA strand, that information can be used to finally synthesize the proteins for which it codes! The process by which mRNA's information is used to form a polypeptide (protein) is known as translation.
Recall that ribosomes are the organelles responsible for protein synthesis. It should come as no surprise, then, that ribosomes will play a central role in translation. The ribosome is made up of two major subunits: a large ribosomal subunit and a small ribosomal subunit.
These subunits are made up primarily of rRNA, another type of RNA to join mRNA and the soon-to-be-described tRNA.
Just like with the DNA during transcription, the mRNA is not destroyed during translation. Its information is simply used up. When the mRNA strand is 'finished' with its job, it is destroyed by digestive enzymes.
In fact, sometimes mRNA strands are used multiple times by multiple ribosomes to make more than just the one copy of the polypeptide for which it codes.
The mRNA strand is a string of nucleotides, of course. However, these nucleotides are not read one-by-one. The ribosome 'reads' the mRNA strand in groups of 3 nucleotides known as codons. Every possible sequence of mRNA nucleotides codes for an amino acid. You do not need to memorize each codon and the amino acid for which it codes - you will have access to a codon chart to use during the AP exam. But know that there are 20 unique amino acids that can be coded for by these codons. Some codons code for the same amino acids, so there are multiple ways to add an alanine amino acid to a polypeptide, for example.
In this diagram, you can see mRNA represented by the pink string of nucleotides. You can also see the ribosome reading the mRNA strand. The ribosome has three 'windows' through which a new molecule, tRNA, can enter. The blue structure is a tRNA molecule, or transfer RNA molecule. This is the taxi that takes amino acids to the ribosome. How does the tRNA 'know' where to go, then? Well, tRNA has an anti-codon at the bottom that will bind to it complementary codon on the mRNA strand. Thus, tRNA will bring the correct amino acid to be added to the polypeptide.
The codon near the middle of the strand labeled AUG is an important one because that is what is known as the start codon. Every polypeptide made in a ribosome begins with methionine, the amino acid coded for by the AUG start codon.
There are three stop codons, that, when reached, cause the ribosome to disassemble, releasing the mRNA. These stop codons follow the last amino acid added to a polypeptide during the translation. If a stop codon is every 'accidentally' added too early (via a mutation), then the protein will not be synthesized properly.
Some proteins need to be modified after translation with some post-translational modifications. This allows the polypeptide to attain the proper three-dimensional structure for the protein to do its job properly.
If given an mRNA strand (or given the DNA template strand), you will need to be able to use a codon chart. Using it is quite simple, luckily! Simply plug in the codon to get the amino acid that corresponds to that codon.
For instance, using this codon chart (the most common kind you'll encounter) let's plug in our start codon, AUG:
A is the first base, so we are going to be in the third row.
U is the second base, so we are in the first column.
G is the third base, so we can see that the amino acid represented by AUG is methionine, or 'Met'.
You don't need to memorize the long forms of any amino acids - the shortened versions will do unless the full amino acid name is provided.
Recall that a mutation is any change in the DNA. There are many kinds of mutations both at the nucleotide and chromosomal levels, but we will focus on the distinctions between point and frameshift mutations.
A point mutation is so called because it is a change in the DNA at just a single point - that is, just a single nucleotide. You will also see this called a substitution mutation in which one nucleotide is substituted for another.
In this example, you can see that the original sequence had a T where the mutated sequence (after a round of DNA replication encountered a mistake) has a C. Now it may seem like a single base pair substitution in a gigantic collection of billions of base pairs may not matter much... and you might be right. There are three possible consequences of a point mutation: a silent mutation, a nonsense mutation, and a missense mutation.
In a silent mutation, the DNA changed but there was no discernible effect on the protein (see image). This occurs because both mRNA codons that are transcribed code for the same amino acid.
In a nonsense mutation, the polypeptide is shortened abruptly by the new stop codon due to the mutation. This almost always results in a non-functioning protein.
In a missense mutation, there is a change to the amino acid that is added to the polypeptide chain. This will generally affect the protein's structure, but the protein may still function depending on the chemical properties and placement of that amino acid. You do NOT need to know the differences between conservative and non-conservative missense mutations. The classic example we will explore in class of this mutation is sickle cell anemia.
"Sickle cell anemia, in its most extreme cases, can be fatal by the age of three in almost 70 percent of sufferers. And what is the difference between a healthy red blood protein and a sickle cell one? Only a single amino acid in the string: the amino acid glutamate is replaced by one called valine in the sixth position in the sequence. A tiny difference in the amino acid sequence can have massive ramifications on the protein, the cells in which the protein is found, and the lives of the individuals who have those cells." (Some Assembly Required: Decoding Four Billion Years of Life, from Ancient Fossils to DNA by Neil Shubin)
As you can see, a single base pair substitution can have a variety of effects. Often it will do nothing. Sometimes it will change the protein. Usually that change is not good for the organism or the cell, but occasionally it is VERY good. This will become more commonly discussed in the next unit on natural selection.
Frameshift mutations are much more severe than point mutations. Rather than changing a single nucleotide, these mutations cause an entire shift in the reading frame. This sounds pretty complicated, but it makes sense if you think of the mRNA as being in sets of codons.
If you mess up one codon by removing a nucleotide, for instance, it will mess up every subsequent codon that is 'downstream' of that mutation. So if a frameshift mutation occurs at the beginning of a gene, it will affect nearly the entire protein!
Let's explore the two ways by which frameshift mutations may occur: deletions and insertions.
In an insertion mutation, a base or multiple bases are added to the DNA sequence where before there was none. This causes all subsequent codons to be shifted because the ribosome will simply read the mRNA 3 bases at a time after the start codon.
So if a base got in that 'shouldn't' be there, it will affect every subsequent codon. This 'frame shift' is a result of the fact that every (or nearly every) subsequent codon consists of a different set of 3 nucleotides than it did before the DNA alteration.
A deletion mutation works in basically the same way but in reverse. So instead of bases being added, they are removed. This also causes the 'frame shift' because every subsequent codon now contains different nucleotides than it did before the DNA sequence changed.
Ironically, often it is less disruptive to the protein that is created if 3 nucleotides are removed (or added) rather than just one. This causes the subsequent codons to be correct, but for there to be one less (or one extra if it is an insertion) amino acid in the polypeptide.