Unit 1: Central Dogma

Overview

The central dogma of biology refers to how a gene is expressed into protein, where multiple steps will be taken, namely transcription and translation. Protein, as the building blocks of life, is the product made according to DNA blueprint, while the RNA convey the genetic information stored in DNA into the synthesis of protein from amino acid. Note that the information flows can be from DNA to RNA or vice versa, but not from protein to DNA or RNA. Although being one of the pillars of biology, a lot of details are still under research. In this section, we are going to discuss the major steps and ideas of central dogma.

For the gene to be expressed into protein, the DNA is first transcribed into mRNA, which is then translated into amino acids. The amino acids chain coils and fold into protein lastly. However, do note that there are genetic flows that does not directly relate to the protein synthesis, such as the reverse transcription, but it will not be covered in this syllabus.

Overview of the central dogma of Biology

DNA

DeoxyriboNucleic Acid is a very complex molecule that stores the genetic information of an organism. They usually exist in the nucleus of the cells, but may also exist in mitochondria.

Nucleotides

DNA is made up of nucleotides. Each nucleotide is made up of a nitrogenous base, a 5-carbon sugar and a phosphate group. Four types of nitrogenous base can be found in DNA, which is Adenine, Thymine, Cytosine and Guanine. Besides of ATCG, we also have other nitrogenous bases such as Hypoxanthine, Xanthine, Uracil and Orotic acid. Based on their structure, all nitrogenous bases can be separated into two groups, purines (2 rings) and pyrimidines (1 ring).

(A nucleotide with adenine base)



Complementary base pairing

Defined by the complementary base pairing rule. A pairs with T by forming two hydrogen bonds in between, C pairs with G by forming three hydrogen bonds in between.

Bonding between A-T and C-G, dotted line represent the h-bond

Structure of DNA

In 1952, James Watson and Francis Crick proposed that DNA exists as a double stranded structure and coils like a double helix. With the two backbone formed by those 5-carbon sugars and phosphate groups, the backbone is called sugar phosphate backbone.

Structure of DNA

(NHS National Genetics and Genomics Education Centre / CC-BY-2.0)

RNA

RNA (Ribonucleic acid) is also an nucleic that is highly similar to DNA. the primary structural difference between DNA and RNA is the backbone composition and the nitrogenous base that it carries. The table below gives a brief comparison between the 2 molecules. For most organisms, RNA is used to convey the stored genetic information (in the form of DNA) to make protein, via the translation process. However, virus (not considered a living organism) do utilize RNA genome to store their genetic information.

The function of RNA is also very diverse, from storing of genetic information to composing biological machines. Here are a few examples of RNAs. For example, mRNA is the messenger that delivers the genetic information to produce the Peptide chain via translations. tRNA (transfer RNA) serves as the physical link between mRNA and the amino acid which aids in the translation process by binding the correct amino acid according the codon on the mRNA. rRNA (Ribosomal RNA) is a key component of the ribosome, which is a biological machine synthesis protein according to the mRNA. tRNA(transfer RNA) is the carrier molecule of amino acid in translation and helps to place the amino acid at the correct sequence.This demonstrates that RNA is a very versatile biomolecule.

RNA vs DNA

(Sponk / CC-BY-SA-3.0)

DNA Replication

DNA replication occurs in nature when cells need to be duplicated. Following is the main steps of DNA replication:

  1. Hydrogen bond breaks
  2. Free nucleotide attach to template strand

Firstly, the double helix needs to be uncoiled so that enzymes can work. If the double helix isn’t uncoiled, the DNA strand will be over-winded. As enzymes will pull apart the strands in later steps, the DNA strand will become more and more over-winded until the torsion is so big that enzymes can’t work.

To solve the problem, an enzyme called topoisomerase binds to the DNA. It uncoils the DNA by cutting one or both DNA strands’ backbone, letting it unwound, then stick the backbone back together. There are two types of topoisomerase, type one cuts only one strand while type two cuts both strands.

Next, we have helicase, another enzyme that separates two DNA strands by breaking the hydrogen bond in between of the nucleotides. As the action of breaking the hydrogen bonds requires energy, helicase uses the energy stored in a molecule called ATP to do so. After DNA passes through the helicase, it becomes two separate strands and that Y-shaped section is called a replication fork.

After the two strands are split, one is called the leading strand and another is called the lagging strand, it depends on the direction of the strand. If the open end is 5’, it is the leading strand. If the open end is 3’, it is the lagging strand

Leading and lagging strand

The next step is attachment of free nucleotides to replicate the DNA. Free nucleotides only attach in one direction, from 5’ to 3’. As the direction of both strands are opposite, they have different mechanisms for free nucleotide attachment.

DNA primase bind to the leading strand

For the leading strand, it is more simple. An enzyme, DNA primase will bind to the 5’ end of the leading strand to produce a ~10 - 12 bp long RNA primer corresponding to the start of the leading strand.

Then, DNA polymerase will bind to the end of the RNA primer and start extending the strand (or said as replicating the lagging strand). The free nucleotides will pair with the nucleotides on the leading strand according to the complementary base pairing. When nucleotides bind successfully, forming a hydrogen bond between the leading strand and itself, the DNA polymerase will move forwards and bind it with the connecting nucleotides.

DNA polymerase bind to the leading strand

For the lagging strand, it is slightly more complex due to the direction restriction of the DNA polymerase. Same as leading strand, DNA primase will be used, however, in this case, multiple of them will be used. As the helicase keeps on separating both strands, making the lagging strand longer, DNA primases will randomly bind to the lagging strand, producing an RNA primer at a random position. By the work of DNA polymerase, free nucleotides will be added in oppose to the direction of the lagging strand until it meets another DNA primase. Besides, we call the replicated DNA Okazaki fragments because they are fragmentized.

At this stage, for both new strands, they consist of both RNA primer and DNA nucleotides. DNA polymerase I, an enzyme, helps removing the RNA primers and replacing them with DNA nucleotides. Finally, with the help of DNA ligase, the new free DNA nucleotides will be binded with the nucleotides originally next to the RNA primer.

Transcription

(Note that the central dogma in more complex for eukaryotes, so it will be discussed in depth in the additional information box below. The content below will only be focussing on prokaryotes.)

Transcription is the first step of in the gene expression that leads to the protein synthesis. It consists of a few steps, including transcription and post transcriptional modification. The key idea for transcription is to make mRNA strands according to the template DNA strands according to complementary base pairing. The initiating DNA sequence of the transcription is known as the promoter, where it recruits the RNA polymerase to bind to the promoter and starts the transcription process. Vice versa, the DNA sequence that signals the transcription to stop is known as the terminator. Some of the key player in this process is the RNA polymerase, RNA nucleotides, DNA strands. The area that is being transcripted is known as transcription bubble, where the RNA polymerase unwinds and opens part of the DNA strands to allow for transcription to happen. The strad that the mRNA binds to is known as the template strand (green), which is used for the complementary base pairing with the mRNA so the sequence of template strand DNA and mRNA is the opposite. The remaining strand is known as the coding strand (blue) , as the sequence of the coding strand equals to the mRNA strand (except the T is replaced with U).


For bacterial transcription, the transcription and translation process both happens in the cytoplasm. There are 3 main phases in the transcription of DNA: Initiation, Elongation and Termination.


https://chem.libretexts.org/Core/Biological_Chemistry/Nucleic_Acids/DNA/Transcription_of_DNA_Into_messenger_RNA

Initiation

The first phase is to break open the hydrogen bond that held the 2 strands together by the RNA polymerase. The RNA polymerase is recruited and bind to the DNA strand to the promoter and unwind the DNA molecule into a partially single stranded and open complex (known as the transcription bubble). It then selects a transcription start site in the transcription bubble, binds to an initiating nucleotide and an extending nucleotide (or a short RNA primer and an extending nucleotides) complementary to the transcription start site sequence.

Elongation

The RNA molecule starts extending in this phase by recruiting free RNA nucleotides and binds to the template strand according to complementary base pairing. This means the mRNA produced has the same sequence as the remaining strand, known as the coding strand(with T substituted by U). After the free RNA nucleotide have arrange themselves and the hydrogen bond holds the nucleotide in place, the backbone of the RNA molecule is ligated together by the RNA polymerase. As RNa nucleotide can only be added from the 3’ end, RNA extends from 5' → 3', an exact copy of the coding strand of DNA. As the mRNA strand is extending, the self-checking process of transcription will also take place and replacing the wrongly incorporated nucleotides with the correct ones. Note that the DNA sequence we can obtain is universally written as the coding strand (same with the mRNA, except T is switched for U)

Termination

As the RNA polymerase approaches the terminator of that DNA sequence, a G,C riched hairpin loop will starts to form that blocks the RNA polymerase, thus the transcription is stopped and the newly transcripted mRNA and RNA polymerase is released.

https://bit.ly/2Jv1GAv

ADDITIONAL INFORMATION ABOUT EUKARYOTES TRANSCRIPTION

As mentioned before, the transcription for eukaryotes is way more complex compare to the prokaryotes. The content below only shows a few of the prominent difference of eukaryotic transcription.

LOCATION

One of the biggest difference between the two is the location of the transcription. For eukaryotes, as its genetic information is stored in the nucleus, the transcription happens in the nucleus, and the finished mRNA product is transported out of the nucleus membrane to be translated. Therefore, transcription and translation can not take place simultaneously, unlike prokaryotes.


ENZYMES

The enzymes for the transcription is also different for the eukaryotes,as the RNA polymerase for the eukaryotes is characterized into 5 groups, which acts on different types of RNA (such as tRNA) for prokaryotes, there's only 1 type of RNA polymerase.


POST TRANSCRIPTIONAL MODIFICATION

Another key differences is the presence of post translational modification, which is exclusive for eukaryotes. There are a few are key steps in the process of modification to pre-mRNA to mature mRNA. Firstly, 5’ capping is required, which is done by adding 7-methylguanosine (m7G) to the 5' end. This helps the mRNA to be exported from the nucleus, prevents the degradation by exonuclease. At the same time, 3’ processing also takes place to add a poly-A tail to the 3’ end, which also helps the mRNA product to be transported out of theeee nucleus and stabilizes the mRNA. The last step is known as splicing,which is the process to remove introns(non coding regions of the RNA), and connects the exons(coding region) together to re-form a mature mRNA. Exons are sections of mRNA which become "expressed" or translated into a protein. They are the coding portions of a mRNA molecule. The introns are “spliced” out of the pre-mRNA and finally allows all the coding parts for one protein to join together. The splicing reaction is catalyzed by a large protein complex called the spliceosome assembled from proteins and small nuclear RNA (snRNA) molecules that recognize splice sites in the pre-mRNA sequence.


Translation

Translation is the process of forming a peptide chain from RNA. The main step of translation is as follow:

  1. A ribosome binds to mRNA at the Ribosomal Binding Site (RBS), which is 8bp upstream of the start codon
  2. tRNA with anticodon complementary to the codon binds to mRNA with an amino acid in a non-overlapping manner
  3. 2nd tRNA binds with the mRNA, a peptide bond forms in between two amino acids
  4. The first tRNA detaches, leaving the amino acid while the ribosome moves downstream
  5. The process repeats until the ribosome meet stop codon and be released

A ribosome has two parts, a small and a large subunit and is mainly consist of ribosomal RNA (rRNA) and protein. When the translation process starts, the small subunit of ribosome, carrying a tRNA (anticodon ‘UAC’) with amino acid ‘Met’, will bind to the the RBS(an mRNA sequence), by the recognition of rRNA

Transfer RNA (tRNA) is a molecule that helps translating mRNA into amino acid. Each tRNA has an anticodon, which recognizes and pair with the mRNA codons, it also has an amino acid, corresponding to the anticodon.


https://www.quora.com/How-many-amino-acids-can-be-carried-by-one-tRNA


The subunit will move towards the downstream of the mRNA , where it will stop until the start codon ‘AUG’ on mRNA is met, where the large subunit of ribosome will bind to the small subunit and start the elongation process.



After the large subunit is attached with the mRNA, tRNA in the surrounding will match the mRNA codon by its anticodon, bringing amino acids into the line. This step is called elongation. If a wrong tRNA is inserted, it will be rejected. If a right tRNA is inserted, a peptide bond will form between the amino acid carried by the tRNA and the previous amino acid. Note the ester bond between the tRNA and the amino acid is now transferred to the peptide bond between the 2 amino acids, which is catalyzed by peptidyl transferase. Thus the tRNA will leave, leaving the amino acid bonded with the peptide chain. Therefore, the ribosome will move downstream as the process repeats.

The process ends when the ribosome come into a stop codon in the mRNA (UAG, UAA, and UGA). In this case, the translation process will be terminated and the ribosome will be released.

For eukaryotes, the translation is only differ in the way to initiate the translation, where the ribosome is directly recruited to the 5”capped end of the mature mRNA and no RBS sequence is found in eukaryotes. The types of ribosome they attract is also different between the 2.

The translation process

(Kelvinsong / CC-BY-3.0)

Amino Acid

Structure

Amino acids (AA) are organic compounds precursing to protein. It contains amine (-NH2) and carboxyl (-COOH) functional groups, along with a side chain (R group) specific to each amino acid. The main difference between all the different AAs is the side chain, where some of them contain aromatic chains while some contain other functional groups.Twenty distinct amino acids is found all organisms, with some being able to be synthesised by the cell from precursors while some are required to be directly absorbed from the environment.

Proteins are the final destination of the genetic flow and thus is the building blocks of life. Picture that the genetic material (DNA) as the commander of life, and sends out messenger(mRNA) to carry out functions with protein such a building organelles. Thus amino acid is extremely important in life as it is the raw ingredient of protein ; without them, the cell cant express its genetic information.

Peptide formation

The amino acids are the monomers of peptides (which forms protein), which is enabled by the presence of COOH and NH2, to polymerize by condensation to form a peptide bond. Note that there is a certain orientation to the polymerization of the AAs , where the N-terminal(Amine terminal-NH2) condense to the C-terminal(Carboxylic acid terminal-COOH) of the AAs, with water given out as by product.

Protein structure

The structure of peptide (polymerized amino acids) is divided into 4 levels, from primary,secondary, tertiary and quaternary.

The primary structure refers to the sequence of the amino acid chain (peptides), right after the translation, before any folding occurs and other modifications is done to the peptide chain.

Secondary structure refers to the 3D structure of a local segment of a folded peptide chain due to the folding caused by hydrogen bond forces. The most common secondary structure is the α-helix and β-sheet. It is formed spontaneously from the primary structure after the translation process, but before the tertiary structure is formed.

The tertiary structure refers to the 3D shape of the peptide chain. The tertiary structure will have a single polypeptide chain "backbone" with one or more protein secondary structures called the protein domains. Amino acid side chains may interact and bond within a particular protein determine its tertiary structure. These protein domains are often the sub-unit of an entire protein, or they can be a completed protein on their own.

The quaternary structure refers to the 3D arrangement of the subunits (tertiary structures) and the incorporation of other non-amino-acids subunits, such as the haem group (from haemoglobin). This allows the protein to create a more complicated complex that allows for more functions. This is the end product of protein synthesis. It will either be used in various cellular activities, being exported to extracellular space and being further modified by the Golgi apparatus. The conformation of the active site for the protein is key for it to perform its function.

Protein folding

Protein folding is a physical process that folds the peptide chain produced in translation into its specific conformation by the chemical nature of the constituent amino acids. The specific conformation of the active site of the protein is what gives the proteins its function.

Driving forces of protein folding

Folding is a spontaneous process that is mainly guided by hydrophobic interactions, formation of intramolecular hydrogen bonds, van der Waals forces and other factors. The process of folding often begins co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still being synthesized by the ribosome; however, a protein molecule may fold spontaneously during or after biosynthesis. While these macromolecules may be regarded as "folding themselves", the process also depends on the solvent (water or lipid bilayer), the concentration of salts, the pH, the temperature, the possible presence of cofactors and of molecular chaperones, which are proteins that assist the covalent folding macromolecules (eg.protein). Although that there are nearly infinite number of ways to fold a peptide chain, but the most probable folding outcome is always the one with the least free energy, since it would be the most stable.

Hydrophobicity is also a key player in the folding of protein, as some of the the side chains of amino acids are hydrophobic while some are hydrophilic. Upon folding, the hydrophobic amino acids stays in the core of the peptide mass that is produced. This is known as the hydrophobic effect, in which the hydrophobic chains of a protein collapse into the core of the protein (away from the hydrophilic environment).

Note that the black dots represents hydrophobic amino acids while white dots represents hydrophilic amino acids.


https://en.wikipedia.org/wiki/Protein_folding

Steps of folding

Secondary structures

As mentioned in the previous session, protein structure can be divided into 4 levels of folding, where folding is the major process that leads the peptide to form its characteristics conformation. Folding of primary structure into secondary structure is a spontaneous reaction,which is mainly driven by the intermolecular hydrogen bond. The image below shows the 2 most common secondary structure: α-helix and β-sheet

𝛼-helix is formed by hydrogen bonding of the backbone to form a spiral shape


http://academic.brooklyn.cuny.edu/biology/bio4fv/page/alpha_h.htm

β-pleated sheet is a structure that forms with the backbone bending over itself to form the hydrogen bonds

http://cbm.msoe.edu/teachingResources/jmol/proteinStructure/secondary.html

Tertiary structure

As secondary structures can be amphipathic in nature (contain a hydrophilic portion and a hydrophobic portion), folding will occur so that the hydrophilic sides are facing the aqueous environment surrounding the protein and the hydrophobic sides are facing the hydrophobic core of the protein, giving rise to the tertiary structure. Once the structure is stabilized by the hydrophobicity, other covalent interaction may come into play, for example, disulphide bond.

The image below demonstrates a few of intramolecular tertiary structures interaction.

Quaternary structure

For the folding into quaternary structure, the word folding become less appropriate as folded tertiary structures are assembled together to give rise to the quaternary structure. At this stage, it is the intermolecular interaction of the individual folded subunits becomes the dominant interaction. Note: the different colour represents an individual subunit in the image.



http://chemistry.umeche.maine.edu/MAT500/Proteins12.html