The idea behind DNA Barcoding is simple: use the genome of an organism to identify what it is, even if you only have a small part of it, or if it is itself very small or otherwise hard to identify using traditional visual methods. This idea relies on variable parts of the genome that can easily be sequenced from almost any organism, though there is no single genomic region that works well for everything. In animals, the mitochondrial cytochrome oxidase I gene evolves rapidly (at least, at some sites) and, because it is mitochondrial, experiences very strong genetic drift that tends to drive the diversity in distinct populations or species to be measurably different from one another.
The early goals of our training workshops will be to familiarize students with the massive GenBank database, which stores all DNA sequence data published over the past 30+ years with information about the species, location, and other features of the organism those data come from. We will use the friendly Geneious software to aid their exploration, and work towards understanding the sequence divergence that can be observed among individuals of the same population or species relative to divergence with sequenced from distinct populations or species. This is called the "barcode gap" and provides visual evidence that (usually) this approach can help us identify an unknown organism down to the species level.
Below are some basic methods and learning goals that we will keep track of during the summer program.
DNA PRESERVATION. At the end of the monitoring steps as glochidia and juveniles drop off of their fish host, each tiny mussel is placed into a tube containing 75µl of solution that is water with 10% w/v Chelex-100 and 5% proteinase K. These are numbered or coded and then frozen until we are ready. The tube with a mussel juvenile is then incubated overnight at 55°C and then frozen again. This method comes from Casquet et al 2012.
POLYMERASE CHAIN REACTION. We then use PCR to target the amplification of the mitochondrial COI gene. Each student will be given 6 unknown DNA samples, and will set up 8 PCR reactions: a "positive control" meaning it uses DNA that we know should be successful for PCR, a "negative control" that should have no DNA in it at all, and then the 6 unknowns. Because the chemicals and enzymes used in these reactions are used in very small amounts, you will first mix up the right recipe for PCR assuming a total of 10 reactions, which keeps the math straight and allows for minor pipetting errors. Before mixing up your PCR recipe, shake or vortex each thawed solution so it is well-mixed.
PCR RECIPE. We are using a Promega GoTaq "master mix" that contains the polymerase, nucleotides, and necessary buffer in a green solution that helps when loading on agarose gels. For a 20µl reaction, we add: 10µl Promega mix, 0.5µl of each primer, 1µl of BSA (bovine serum albumin, helps with PCR inhibition), and 7µl of water for a total of 19µl per reaction. Since we are making enough for 10 reactions, you will mix 100µl of mix, 5µl of each primer, 10µl of BSA, and 70µl of water (now vortex your recipe mix!) and pipet 19µl into each of your reaction tubes. Then, CLOSE the last tube - that is your negative control - and then add your positive control DNA (1µl) to the first tube, and 1µl of each of your unknowns to the remaining tubes being careful about their order - always take notes in your lab notebook.
SETTING UP PCR. Once all tubes are closed, we will spin your reactions down in their tubes and place in the thermal cycler. Our program is called "Flat 40": 95° for 3 minutes, then 35 cycles of (95° 30 sec, 40° 30 sec, 72° 60 sec), then 72° 5 minutes, then 12° hold indefinitely. When the reactions are done they can be refrigerated until we run on a gel.
RUNNING ELECTROPHORESIS GELS. We will make a mixture of 100mL TBE buffer with 1g of Agarose, or a similar ratio for different sized gels. This mixture will be cooked in a microwave, allowed to slightly cool, and then mix in 10µl of GelRed - a relatively safe DNA intercalating dye that allows us to visualize DNA under UV light. Gels should run at ~100V in 0.5x TBE buffer with 3µl of size standard in the first well and 5µl from each of your PCR reactions; the gel will need to run for about 25 minutes.
CLEANING SUCCESSFUL REACTIONS. To digest the remaining primers in your PCR reactions that worked, we will make a mixture of water (1.8µl), antarctic phosphatase (1.0µl), and Exonuclease I (0.2µl). As before, we will make a combined mixture that is sufficient for all of our reactions, vortex it, and then add 3µl of this mix into each new tube with 8µl of the successful PCR reaction. As before, we have to be very careful we know what reaction goes into each tube - take good notes! This mixture will be placed back on the thermal cycler at 37°C for 15 minutes, 80°C for 15 minutes, 10° hold.
SEQUENCING DNA. Now we are just about ready! Make a dilution of the HCO primer in a new tube. We will add one part primer to 2 parts water, a total of 5µl of diluted primer for each sequencing reaction. Into a 1.5mL centrifuge tube, we add a Psomagen barcode label to the outside, all the way around. The bottom half of the barcode label repeats the ID number and can go into lab notes to match with which mussel is being sequenced. Then add 5µl of the diluted primer mix to each tube, and 5µl of the mix that came from the "cleaning" reaction. Close the tube tightly and add to our shipment. These will then be FedEx'd to Psomagen and we will have data in about 48 hours.
One of our first activities will be to learn more about how DNA sequence data are archived after every published genomic study across biodiversity. We will start in week one making sure that each student and mentor has access to a copy of the software Geneious, and spend time using it to gain comfort in using many of the bioinformatic tools it provides. Dr. Wares will introduce students to the idea of DNA barcoding - that all life has distinct genomic signatures that are formed by mutation, genetic drift, and population isolation.
In week two, we will all use Geneious to find DNA sequences from a target organism that we decide on, as well as 1-2 related congeneric species, and learn about sequence alignment and calculating genomic distances.
In week three, we will learn how to plot the genomic distances you observe using Google Sheets, and decide as a group how to determine for the gene we are studying (mitochondrial COI) and freshwater mussels what we consider to be a "good" identifying match to the species level, or only good enough to be certain to the level of genus.
In week four, our exercise will use DNA-based distances to estimate the phylogeny, or gene tree, that shows how individual sequences are related to each other -- this will help us to see that not all species (e.g. mussels in the genus Elliptio) have sufficient data to distinguish them. We will discuss what additional information would be needed.
In week five, we will review the diversity of unionid mussels overall from an evolutionary point of view. What makes them so different, so diverse?
By week seven, we expect to get our own data back, and we will learn how to edit and align these data with available data from Genbank using the BLAST tool and phylogeny estimation.
This should set us up for the end of the program to write up individual learning outcomes, being sure you understand the measures we use to describe genomic distances, and we will combine these results into a group outcomes poster or report. We are looking forward to learning more new information about mussels in 2022!