Post date: May 01, 2019 9:23:15 PM
Scripts and results for the whole genome phylogenetic analyses are in /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/phylo_2019/. The goals of these analyses are to generate a consensus species tree, and then to ask whether autosomes or the Z chromosome support this true more (i.e., if more chunks of the genome support the tree more). Trees were inferred with RAxML (version 8.2.9). Note that most analyses use the standard GTR model with no rate heterogeneity (because the program suggested that is what I should do given the values of alpha inferred). There is a model that corrects for ascertainment bias, but that only works if sites are not heterozygous (so can't use it).
1. Generate alignment file for just autosomes (drops the Z). This is what I want to use for the "species tree"
.
raxmlHPC-SSE3 -E subAutos -s LycWGSnpAlignment -m GTRGAMMA -n LycWGSnpAlignAuto
This leaves 2,687,989 SNPs.
Next I inferred the ML tree, and ran 100 bootstrap replicates with the rapid BS algorithm and ML estimates of base frequencies.
raxmlHPC-SSE3 -f a -x 12345 -p 12345 -# 100 -m GTRGAMMAX -s LycWGSnpAlignment.subAutos -n raxmlAuto
Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 2.076444
Tree-Length: 0.163542
rate A <-> C: 0.995901
rate A <-> G: 3.536782
rate A <-> T: 2.213155
rate C <-> G: 0.611955
rate C <-> T: 3.538003
rate G <-> T: 1.000000
freq pi(A): 0.242545
freq pi(C): 0.257027
freq pi(G): 0.257549
freq pi(T): 0.242879
Found 1 tree in File /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_bestTree.raxmlAuto
Found 1 tree in File /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_bestTree.raxmlAuto
Program execution info written to /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_info.raxmlAuto
All 100 bootstrapped trees written to: /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_bootstrap.raxmlAuto
Best-scoring ML tree written to: /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_bestTree.raxmlAuto
Best-scoring ML tree with support values written to: /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_bipartitions.raxmlAuto
Best-scoring ML tree with support values as branch labels written to: /uufs/chpc.utah.edu/common/home/gompert-group1/projects/lyc_hybanc/phylo_2019/RAxML_bipartitionsBranchLabels.raxmlAuto
2. Then, I split the alignment by LG (including Z = 23).
perl mkLgSubsets.pl
perl moveFiles.pl LycWGSnpAlignment.subLg*
3. I then fit ML tress for non-overlapping windows of 1000 SNPs for each LG.
sbatch SubWrapWindows.sh
#!/bin/sh
##SBATCH --time=72:00:00
##SBATCH --nodes=1
##SBATCH --ntasks=20
##SBATCH --account=gompert-kp
##SBATCH --partition=gompert-kp
##SBATCH --job-name=raxml
##SBATCH --mail-type=FAIL
##SBATCH --mail-user=zach.gompert@usu.edu
cd /uufs/chpc.utah.edu/common/home/u6000989/projects/lyc_hybanc/phylo_2019
for i in {1..23}
do
echo "working on LG $i\n"
cd LG$i
perl ../mkRunWindows.pl LycWGSnpAlignment.subLg$i > log
cd ..
done
4. Note, my initial run described above had an issue with not subsetting the Z properly. During the course of fixing this, I returned to the filtering step and dropped out SNPs that were near each other. My final results using this data set are described in the next post.