Have you ever stumbled upon an interesting plant, animal, or insect and wondered what it was? Traditionally, we could consult a field guide or taxonomic key to help us attempt to identify the specimen. But there are still some problems. Scientists estimate that there exist over 8.7 million different species on earth, yet only about 1.25 million have been described and catalogued. That field guide you’re referencing is only going to have the most common species listed. And even if your specimen of interest is a common species, often multiple species can look so similar that even experts have trouble telling them apart. For example Astraptes fulgerator is a common type of skipper butterfly that was first described in 1775. Not until 2004, however, did scientists learn what was thought to be a single species of butterfly was actually a species complex consisting of at least 10 different species. With this much complexity, how can non-experts ever hope to get a handle on species identification?
Enter DNA Barcoding. Like a barcode found on a box of cereal allows point-of-sale systems to automatically identify a product, DNA barcodes use a short section of DNA to uniquely identify organisms to the species level. This technique was first proposed by a scientist named Paul Hebert in his landmark paper Biological Identification through DNA Barcodes. In the paper, Hebert described the use of short, highly variable regions of DNA, that he called barcodes, as a means to identify species. And since it’s publication, the protocol for DNA barcoding has been robustly developed by a team of scientists out of Cold Spring Harbor Laboratory (CSH).
While DNA barcoding is a relatively straightforward process, it is a great activity as it teaches many relevant molecular biology techniques. The detailed protocol from CSH can be found here.
The first step in DNA barcoding is to find something to barcode. There are many ways to go about this, but whatever way you choose you want to make sure you’re consistent during the entire experiment. A good way to collect specimens is following practices used by ecologists.
After you have collected your specimens, you will need to document them including information on when and where they were found. It’s also a good idea to snap a picture. And lastly, you should make an attempt at identifying your specimen with classical methods such as using a field guide as mentioned earlier. While our goal is identification with DNA, often times you can make a pretty good qualitative identification and use the DNA barcoding as confirmation.
We want to analyze DNA, and so we need to get it out of the cells of our specimen. There are several methods for DNA extraction but the basic process involves breaking open cells and then separating the nucleic acid from everything else.
Once DNA is extracted and isolated, the barcode region needs to be isolated and amplified. To do this, we use a technique called PCR or polymerase chain reaction. PCR is an important molecular biology technique used to create lots of copies of certain regions of DNA– think of it as a molecular photocopier. PCR will use primer sequences, short pieces of DNA that match genomic regions ahead of and behind our barcode, to isolate just the region we’re interested in. After PCR we should be left with lots of copies of our barcode that can then be sequenced.
But before we send the samples off to a lab for sequencing, it’s a good idea to double check everything went according to plan. That’s where gel electrophoresis comes in. Gel electrophoresis, often referred to as a running a gel, is a technique used to separate molecules by molecular weight. In terms of DNA, this means that we can separate pieces of DNA based on how long they are (measured in base-pairs or bp).
We’ve successfully extracted DNA and confirmed that our barcode was amplified. Now it’s time to read the barcode. And to do that we will be using a DNA sequencing technique called Sanger Sequencing. This method, developed in 1977 by Frederick Sanger, is still widely used in research today. Sequencing will result in computer file containing the As,Ts,Gs, and Cs that make up the barcode. We can use bioinformatics tools to match these sequences with know DNA sequences.
The International Barcode of Life (iBOL) is a research alliance aimed at making the study of biodiversity easier and more accessible. DNA barcode sequences determined in this symposium will be shared to the iBOL database contributing to the ongoing mission of creating creating a catalog of barcodes of every living organism on earth.