Post date: Oct 30, 2016 7:2:25 PM
I want to create (a) a phylogeny from all of the SNP data, and (b) a phylogeny from contiguous subsets of the SNP data (SNP phylogenetics is weird, but this is what Victor did for the Science paper... and I am not really interested in having a proper phylgeny). For now, I will focus on the former. I plan to use RAxML or ExaML for this, but first some formatting is needed. Here is what I did.
1. Sorted the SNP files (the info file and the genotype file) by LG and cM position. This will be useful for other downstream analyses. Note that this involves running the same script twice and this is all within the /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/genome_variants/ directory.
perl addLgInfo.pl orderedLinkageMap.txt LycWGDat.txt
perl addLgInfo.pl orderedLinkageMap.txt LycWGSnpData.txt
This generates the sorted* files, which have 922,532 SNPs as they only include the LG SNPs.
2. Generate an alignment file for RAxML. Heterozygotes are converted into IUPAC ambiguity codes, and NA is converted to N.
perl mkAlignment.pl
This generates LycWGSnpAlignment
3. Whole genome phylogeny with RAxML (see manual here for details)
## first exclude the sex chromosome
raxmlHPC-SSE3 -E subAutos -s LycWGSnpAlignment -m GTRCAT -n LycWGSnpAlignAuto
This leaves 901,292 autosomal SNPs
Note--it would be nice to use the ascertainment bias option, but this will not work unless all sites have both homozygotes (ambiguities are modelled in such a way that this could be the same as the homozygote).
## now run the bootstrap analysis and find the ML tree, this uses the rapid BS algorithm with ML estimates of base frequencies
raxmlHPC-SSE3 -f a -x 12345 -p 12345 -# 100 -m GTRGAMMAX -s LycWGSnpAlignment.subAutos -n raxmlAuto
Model Information:
Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 2.309108
Tree-Length: 0.158221
rate A <-> C: 0.995991
rate A <-> G: 3.738252
rate A <-> T: 2.245239
rate C <-> G: 0.629069
rate C <-> T: 3.752559
rate G <-> T: 1.000000
freq pi(A): 0.241063
freq pi(C): 0.258974
freq pi(G): 0.259600
freq pi(T): 0.240362
4. Plot of tree and bootstrap support (all nodes have 100% BS support)
library(ape)
library(phangorn)
mltr<-read.tree("RAxML_bestTree.raxmlAutos")
mltr$tip.label<-c("Sierra_Nev","L_idas","L_melissa","Warner_Mt","L_anna")
plot(midpoint(mltr))
bstr$tip.label<-c("L_melissa","Warner_Mt","L_anna","Sierra_Nev","L_idas")
plot(midpoint(bstr),show.node.label=TRUE)