4 - The End is Near?

Since our last update we have been able to finish all of our lab work and have been working on the copious amounts of data analysis. We had some lab assistants for a day(although they had never done this before), we finally sent our samples for sequencing, got our data back and began to analyze it, and have now been preparing to present our project in science fair. It's been a long couple of weeks since our last update, but the end of our project is near... maybe...

This Time We had Lab Assistants

On December 9, we had the opportunity to invite family and friends to join us in lab. We performed gel electrophoresis, which allowed our guests to see actual DNA. We all enjoyed being able to share what we have been working on for the past year, and sharing the experience of running lab with our guests. None of them knew anything about how this project worked, so it was fun to teach them how to do a part of our project.

First, we had to teach them the basics of PPE or personal protective equipment. This was pretty simple and just involved gloves and goggles. Next, we taught them how to make their own gels for gel electrophloresis which they ran later. Finally, they learned how to use a micro-pipette to add the DNA samples to the gels and then run them. Everyone was blown away by how cool the glowing bands of DNA looked as they moved through the gel.

The Sketchy PCR Clean Procedure

Before sending our samples off for sequencing, our team needed to wash our samples. However, this procedure required 100% Ethanol, which, unbeknownst to us, was difficult to come by. So, our team proceeded and was about to use a different concentration, when, low and behold, in the back of the flammable cabinet, was 100% Ethanol. We scooped it up, and found that we were allowed to use it. This saved our team a considerable amount of time and resources, as we didn't have to reconfigure our concentrations.

This was a very sketchy procedure for all of us though because it required us to use all of the samples, there would be no left overs and no do overs. A mistake here would mean that the sample could not be sent for sequencing.

Thankfully, the procedures went smoothly and the samples were almost ready to be sent for sequencing.

The Last Day of Lab Work

Next was the fun part, preparing the sequencing plate with all of the samples with the exact amounts of DNA for each sample. This was measured all the way down to .5ul, the smallest units we ever used. That's equivalent to five ten millionths of a liter! Even better, Andrew had to leave, so Camden was all on his own to load every single one of the 96 well plates.

Every sample was carefully added to the plate. After working on preparing these samples for such a long time, any sort of contamination would be horrible.

After working on this project for almost a year we finally were ready to finish all of the lab work. Camden and Andrew visited Dr. Wilson's lab carrying all of our remaining samples. Many of them had failed before this point in either the extraction step or in the step to make more copies of the gene. We carried a small Styrofoam box into the botanic gardens carrying all of our hard work in tiny 1.5 uL micro centrifuge tubes.

Once we arrived we had to measure out exactly how much DNA was in each sample using a super accurate tool called a nano drop. This tool allowed us to know exactly how much of our already tiny samples would actually be sent for sequencing.

Finally, after spending 5 hours at the lab, the team loaded the very last well of this experiment. The lab work was finally done and we all thought now we could have some time to rest. Little did we know this was the beginning of the hardest work of the entire project.

Now That's a Lot of Data

A week after Dr. Wilson sent our samples to get sequenced we finally got our DNA sequences and we were super excited...

Until we realized we still didn't know how to read them.

What are all of these weird formats? AB1, FASTA, What? The computers didn't even know how to read the sequences. Fortunately after about 5 hours of research, Camden was finally able to find a program that could understand these weird formats.

After we figured out how to read the sequences we had to figure out how to line up the forward and reverse sequences to make a consensus sequence. Now that's a mouthful. Basically, DNA has 2 sides that line up to one another which are known as the forward and reverse sequences. What we had to do was line them up to see if the sequences actually matched. Thankfully, Dr. Wilson recommended a program that could do all of this work for us, as long as we took the time to upload it. Now All we had to do was line up 48 forward sequences to their reverse sequences, no biggie...

Sometimes, Data Analysis Kind of Sucks

Data analysis is probably the most important part of the project, and it's also the most time consuming. In the next 2 weeks of data analysis done by yours truly, Camden, you will see just how much of a struggle it can be to get the answers you need is. Even though we were trying to answer questions about the different species of Russula that had been sequenced, Camden began to try and answer a new completely unrelated question just for himself...

How much caffeine can a person drink before it's lethal?

The First Problem

After making the first couple of sequences line up, Camden found that many of the sequences were at a super low quality and wouldn't line up, and we had no idea why. Shortly after we found this problem, Dr. Wilson emailed us with a possible explanation. about half of the forward sequences didn't work and it was probably because of an error in sequencing. That's always a fun time, but at least we knew what was wrong. We continued to combine the sequences where we could and used only the reverse sequences where the forward one was too low quality. At least we would have some data to analyze.

Uploading the Data Takes FOREVER

Once all of the sequences were ready it was time to upload them all to BOLD systems in order to understand what species they actually are. BOLD is a great resource, it just takes so long to use. Every single sample had to have 2 files created for it, so in total 96 files needed to be created. One required all of the information available on the species which would take about 5-7 minutes to create. Next, a file with the sequence had to be created which took 3-5 minutes to make. Sifting through all of this data took forever, well not really. This part took only about 3-4 hours which we thought would be the worst of it. Boy were we wrong.

Creating the Phylogenetic Tree

Wait, what's a phylogenetic tree? This is a really amazing tool that allows people to compare how similar samples are and where species are predicted to have separated into different species. As the name implies, it looks kind of like a tree with all of the different species at the ends of the "branches". To the right is an image of the phylogenetic tree that we have produced using all of our sequences.

Thankfully for us, BOLD has a tool to produce a tree based on how similar the DNA of the samples is. Unfortunately, this just lead to more questions on why everything was placed the way it was. We even had 2 samples of the same species separated by a pretty great distance. What? some parts of it just made no sense at all.

On top of that, there were 2 sequences that had nothing that was similar to in the entire database. Could they be new species? We had no idea.

Now that we had new questions, we needed to find answers. By this point, Camden had already spent nearly 20 hours staring at all of this data, and little did we know he was just starting. In order to answer these questions he would be spending a lot more time staring at the computer.

The Next Step

In order to figure out what those 2 unknown samples are, we first decided to BLAST the samples. This basically means that the computer would compare them to every other DNA barcode to see which ones it matched the closest to. Unfortunately, this didn't turn up any good results. Both of them were only about 90% similar to the next closest sample, and that is a pretty large distance when talking about samples within the same genus. 10% different might not seem like a lot, but many of the other samples that we compared had a difference of less than 5%.

Now that we knew that the samples didn't really turn up any good results, we decided to see if they could possibly be a new species.

Now it was time for some stats.

A Threshold for Speciation

In order to figure out if these weird samples are most likely a new species, we would need to determine a threshold for speciation. This sounds pretty complicated, and it is, but in basic terms it means that if 2 samples are a certain percent different from each other, then that means that it is likely to be a different species. In order to find this number, Camden needed to compare how similar a large amount of samples were to each other and then find the average similarity between all of the samples. This would mean staring at a computer for a very long time.

In order to get these numbers, 49 different species were compared making around 2,500 comparisons. THAT'S A LOT OF NUMBERS!

Entering all of these values and making the comparisons between them was beyond exhausting.

This is what the data looks like to find a threshold for speciation. This is only a fraction of the data.

Here is a visual representation of some of the sequences being compared. This only shows the first 17, but the list is actually 49 samples long!

Camden's support from coffee and a monster energy drink while pouring over the data from our lab notebook and on BOLD systems.

A Future Research Question?

It's a bit beyond the scope of our research, but how much caffeine can a person consume before it's lethal?

Redbull, Rockstar's, Montesr's, Coffee, and Now Glasses

Based on how long it was taking Camden to enter the data, he realized he might need some support from some energy drinks, and eventually glasses. Pounding on the computer entering every single one of the 2,500 different values was exhausting, so Camden turned to some energy drinks to keep going. It's pretty hard to fall asleep staring at a screen for hours on end when you down a couple of energy drinks. With this many energy drinks, sleep has just become an abstract concept.

Staring at the computer screen for such long periods of time also puts strains on your eyes, so Camden also bought his first pair of glasses so he could keep working efficiently.

Oh, the joys of 50 or so hours of data analysis in a week and a half.

Fortunately, after spending so much time analyzing the data the end of the analysis is almost in sight. Hopefully in the next week or so we will be able to determine if there is a good chance that the 2 unknown samples are a new species. I guess we'll find out soon enough.

Preparing For Science Fair

In order to prepare for science fair, Jason and Andrew have been working diligently on our poster, perfecting the proper way to present all of our research. Preparing the poster, as well as formatting our information, will allow us to present our research at science fair. Alongside Camden, Jason and Andrew were also spending dozens of ours working with a particularly unfriendly word editing program. Additionally, the team has been hard at work preparing all of our forms. In order to participate in science fair, there were several forms that we had to fill out, and then we prepped our binder. Due to all this work, our team is almost ready for science fair!