Here is where I post particular data or analytical resources, datasets, or important/useful links for the class.
Your very-much-still-being-developed textbook and a recent (Nov 16) PDF of that draft. I am hoping we can all get towards using the Shiny/web document quickly. As the semester rolls out I will be changing the file chapter by chapter and will provide links to those sections.
Link to downloading and using Geneious at UGA
note that if you are working off-campus (or on a laptop anywhere, probably) you will also need the UGA-approved VPN application which gives you a secure authentication with UGA computers using your MyID
this may also require that you install the Duo app for 2-factor authentication. Sorry, it's the reality for doing science at UGA!
Installing R on your computer will depend on your operating system, but start here.
After installing the latest version of R, install the latest version of RStudio.
At this point, go to the GitHub link I posted above, and download the textbook by clicking on the 'Code' button and Download ZIP. That directory will then be opened on your computer and you can open the .Rmd file using RStudio.
Links to required reading will be put on the Calendar.
Zoom For Molecular Ecology
Any day that the Calendar calls for it, or on days you are unable for health or other reasons to make it to class, you can attend by Zoom using this link:
https://zoom.us/j/96275017251?pwd=b1Q4MVQvRkJZU0NIWnJqWmExaUJqZz09
Meeting ID: 962 7501 7251
Passcode: 854260
I expect you to follow these UGA guidelines for what happens if you test positive for COVID, or if you have a recognized exposure.
Barcode Gap Exercise
in Geneious, find your target group of species, at least two related (same genus preferably) species, with sequence data that can be aligned. For animals, cytochrome oxidase I or cytochrome b are good bets for finding lots of data; for fungi, internal transcribed spacer (ITS); for plants, ITS or rbcL; for microbes, 16s ribosomal sequence - there are others, but these are good starting points
align all the data, looking for those characteristics of believable homology - more sites shared than not, few gaps (insertion/deletions), etc.
from the alignment, you will have % similarity scores in Geneious. Those should be converted by taking that % similarity = S and calculating a new distance, d = (100 - S)/100. This way a % similarity of 99.7 turns into a distance of 0.003, or 0.3% divergent.
Having those distance values, you can then type them into the 'within' and 'between' arrays in the R code (in RStudio) and run each line of code to get a histogram. DO NOT WORRY if this seems overwhelming - you don't have to use R, but don't be afraid of trying and asking for help. I will accept this assignment however you perform it.
After generating the 'barcode gap' plot for your data, write a short paragraph or two about the challenges you encountered, and how well these data could be used to uniquely identify an organism to the species level, as in the pollinator metabarcoding paper we read for THursday.
Here is the .R file for the histogram, (for "barcode gap" exercise) but I'm not sure sharing it this way will be the easiest...
can also just paste in this code:
# barcode gap
# 1. have distances for sequence data WITHIN a species (multiple individuals)
# Dr. Wares will add a conversion factor from Geneious to the numbers you want!
# OK so for the numbers Geneious gives you for SIMILARITY, we want DISSIMILARITY
# so put that number (for example, 99.70) into the equation below
thevalueIwant<-(100-99.70)/100
# (100 - x)/100 is the conversion factor, in other words, from SIMILARITY to DISTANCE
# and you get an answer that is basically 0.003, or 0.3% divergent. You can do
# this conversion in Excel or in R or however easiest for you.
# 2. as well as distances BETWEEN species (multiple individuals preferable)
# put all of the distances among individuals WITHIN species into this array
# type them between the parentheses with commas between eg 0.002, 0.003, 0.1...
within<-c(0.01,0.015,0.01,0.005,0.007,0.008,0.012, 0.02,0.018,0.0,0.001,0.01,0.015,0.01,0.005,0.007,0.008,0.012, 0.02,0.023,0.0,0.001)
# now do the same for distances between individuals from DIFFERENT species
between<-c(0.04,0.05,0.051,0.04,0.07,0.038,0.045,0.05,0.051,0.04,0.07, 0.05,0.051,0.04,0.07)
p1 <- hist(within)
p2 <- hist(between)
# for more on hist function: https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/hist
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,0.15)) # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,0.15), add=T) # second
# does this gene region, in your group of organisms, ENSURE that your DNA sequence can identify to species level?