Unit 7: Sequencing Analysis

Introduction

DNA sequencing is to determine the exact order of base pairs (A, T, C, G) in a segment of DNA by analytical method. It is one of the most important tools and technological advancement made in the bioinformatics field as it allows a precise, rapid and affordable way to determine the sequence of a queried DNA sequence. However, the technology in sequencing is always evolving and constantly the price is constantly dropping, so we will only be going through some common and significant methods of sequencing.

Sanger sequencing

One of the most common older generations of sequencing is the Sanger Sequencing method. Please watch this video (https://www.youtube.com/watch?v=ONGdehkB8jU) to grab the idea first. Slightly similar to a PCR, it uses heat to separate the 2 strands of DNA and cools down for the primers to anneal.

Its basic operation requires single-stranded DNA template (separated by heat), DNA primer, DNA polymerase, normal nucleotides (dNTPs), and modified nucleotides (ddNTPs), the latter of which terminate DNA strand elongation. These chain-terminating nucleotides lack a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, causing DNA polymerase to cease extension of DNA when a modified ddNTP is incorporated. The ddNTPs may be radioactively or fluorescently labeled for detection in automated sequencing machines. Note that the ddNTPs is added approximately 100-fold excess of the corresponding dNTPs, which allows for enough fragments to be produced while still transcribing the complete sequence.


The process of sequencing should be very well explained in the video, but just for a recap, the primers anneals to the template strand (query sequence). The dNTPS is then attached to the template strand according to complementary base pairing, however, by random, a ddNTPs may bind to the extending strand and terminates the action of DNA polymerase as the ddNTPs is not able to polymerize. After multiple cycles, all possible locations of the DNA strand is recorded by the ddNTPs, all different lengths strands are produced and is then separated by capillary electrophoresis according to their sizes. The pattern of bands reflects the exact sequence by reading them in chronological order from the positive end to the negative end of the gel. As mentioned before, as the phosphate group in the backbone is negatively charged. Under an electric current, the DNA strands are attracted from the negative pole to the positive pole, where smaller strands move faster and vice versa. Thus smaller strands (upstream of the sequence) travels further than the larger strands (downstream of the sequence). The gel is then passed through a beam of laser to trigger the fluorescence tagged ddNTPs, which is then recorded by a camera.

This is one of the most proven methods and the basic principle has been using for the past few decades already, thus a new generation of sequencing method was given rise. Different iterations and improvement has been done on it to improve its accuracy and speed to make it more efficient. Compared to other methods, the Sanger method is cheaper and easy to handle, but lacks the high throughput ability compare to Next-Gen methods.


Data analysis

The interpretations of Sanger sequencing result is fairly easy to understand and follow. When using the fluorescence method, the colour codes are as follow: A is green, C is blue, G is yellow and T is red.

Please read the following document from the University of Michigan about the interpretation on the results of a sequencing: https://seqcore.brcf.med.umich.edu/sites/default/files/html/interpret.html#general

Implications

This technology has led to major advancement in multiple fields, such as evolutionary science and molecular biology. The Human Genome Projects(HGP) is a prime example of the implications on the sequencing, which is an extremely large scale, international study in mapping the human genome by DNA sequencing. The project allows scientist to have much deeper understanding of molecular genetics of humans and further helps in solving.

For doing iGEM and other everyday molecular biology, sequencing is also an extremely important tool as it allow you to verify and determine the sequence is correct or not. For example, after cloning an entire plasmid, it is best to extract the plasmid and to send it to BGI for sequencing to double check whether the results are right. Even though the bp may be correct, there might be substitution or other mutations that can still lead a failed clone, so sequencing helps us identifies hidden error from cloning or mutations.

As mentioned before, the technology for sequencing is constantly evolving and updating. Although Sanger method is very well-established, Next-Generation(NextGen) methods are developed and allows for a highly parallel, faster, cheaper. This all makes sequencing more accessible and affordable to more scientist, facilitating the progress in various biology fields. One of the best implications of DNA sequencing is the Human Genome Project (HGP), which originates from the 90’s and took approximately 15 years to complete. This large scale international project aims to map and sequence the entire genome of human, and thus sparks a lot of new technologies that is now commonly used in sequencing today. It also gave rise to the NextGen sequencing that is used today.