Post date: Nov 24, 2017 10:47:23 PM
Melanic bugs were initially excluded from the Hwy. 154 sample. I am now adding them and re-doing variant calling, etc.
First, I copied the new data from diogenes, and put it in a run2 sub-directory in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/clines/hyw154/ (the original data were moved to a run1 sub-directory):
scp -r -P 321 -i ~/.ssh/id_rsa999 zgompert@diogenes.shef.ac.uk:/stash/timema/Clarissa_meth_Freddie_clines_Marisol_phylo_Patrik_timeseries/03_split/* ./
This includes more than just Freddie's data, but I am pretty sure all of the cline samples are the ones that begin 16_p*, though even this appears to include a few re-runs. Thus, I am focusing on these for the actual analyses. Here is what I did.
1. Alignment with bwa (0.7.10-r789) using aln and sampe. Sets of 10 jobs are put on each node.
perl ../scripts/wrap_qsub_slurm_bwa.pl 16_p0*
Here is one sub-set of one full job (10 jobs):
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/clines/hyw154/run2/
bzip2 -dk reads1/16_p0315_2.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads1/aln_1_16_p0315_2.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/16_p0315_2.fq
bzip2 -dk reads2/16_p0315_2.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads2/aln_2_16_p0315_2.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads2/16_p0315_2.fq
bwa sampe -n 1 -N 1 -r '@RG\tID:tcr-16_p0315_2\tPL:ILLUMINA\tLB:tcr-16_p0315_2\tSM:tcr-16_p0315_2' -a 500 -f align/aln16_p0315_2.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/aln_1_16_p0315_2.sai reads2/aln_2_16_p0315_2.sai reads1/16_p0315_2.fq reads2/16_p0315_2.fq
bzip2 -dk reads1/16_p0327_1.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads1/aln_1_16_p0327_1.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/16_p0327_1.fq
bzip2 -dk reads2/16_p0327_1.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads2/aln_2_16_p0327_1.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads2/16_p0327_1.fq
bwa sampe -n 1 -N 1 -r '@RG\tID:tcr-16_p0327_1\tPL:ILLUMINA\tLB:tcr-16_p0327_1\tSM:tcr-16_p0327_1' -a 500 -f align/aln16_p0327_1.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/aln_1_16_p0327_1.sai reads2/aln_2_16_p0327_1.sai reads1/16_p0327_1.fq reads2/16_p0327_1.fq
bzip2 -dk reads1/16_p0327_2.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads1/aln_1_16_p0327_2.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/16_p0327_2.fq
bzip2 -dk reads2/16_p0327_2.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads2/aln_2_16_p0327_2.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads2/16_p0327_2.fq
bwa sampe -n 1 -N 1 -r '@RG\tID:tcr-16_p0327_2\tPL:ILLUMINA\tLB:tcr-16_p0327_2\tSM:tcr-16_p0327_2' -a 500 -f align/aln16_p0327_2.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/aln_1_16_p0327_2.sai reads2/aln_2_16_p0327_2.sai reads1/16_p0327_2.fq reads2/16_p0327_2.fq
bzip2 -dk reads1/16_p0327_3.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads1/aln_1_16_p0327_3.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/16_p0327_3.fq
bzip2 -dk reads2/16_p0327_3.fq.bz2
bwa aln -n 5 -l 20 -k 2 -t 12 -q 10 -f reads2/aln_2_16_p0327_3.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads2/16_p0327_3.fq
bwa sampe -n 1 -N 1 -r '@RG\tID:tcr-16_p0327_3\tPL:ILLUMINA\tLB:tcr-16_p0327_3\tSM:tcr-16_p0327_3' -a 500 -f align/aln16_p0327_3.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta reads1/aln_1_16_p0327_3.sai reads2/aln_2_16_p0327_3.sai reads1/16_p0327_3.fq reads2/16_p0327_3.fq
2. The alignments were compressed, sorted and indexed. I then linked the alignment files from run1 and run2 to /uufs/chpc.utah.edu/common/home/u6000989/data/timema/clines/hyw154/align_comb/. Note that the following individuals from run2 didn't link, as they were already part of run1:
ln: failed to create symbolic link ‘./aln16_p0079_1.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0079_1.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0095_1.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0095_1.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0104_6.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0104_6.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0132_13.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0132_13.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0164_3.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0164_3.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0287_4.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0287_4.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0294_7.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0294_7.sorted.bam.bai’: File exists
ln: failed to create symbolic link ‘./aln16_p0336_10.sorted.bam’: File exists
ln: failed to create symbolic link ‘./aln16_p0336_10.sorted.bam.bai’: File exists
At some point, we should/could use these to look for repeatability.
3. Variant calling with samtools 1.5 and bcftools 1.3 (not necessarily the same as before). This was done in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/clines/hyw154/align_comb/ with the callvar.sh scripts, which is pasted below:
#!/bin/sh
#SBATCH --time=240:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=samtools
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load samtools
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/clines/hyw154/align_comb
samtools mpileup -b bams -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/map_timema_06Jun2016_RvNkF702.fasta -q 20 -Q 30 -I -u -g -t DP,AD,ADF,ADR -o tcrHwy154Vars.bcf
bcftools call -v -m -p 0.01 -P 0.001 -O v -o tcrHwy154Vars.vcf tcrHwy154Vars.bcf
4. Variant filtering and conversion to gl. Done in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/clines/hyw154/variants_comb/.
## filtering of variants, with numbers reatined
perl ../scripts/vcfFilter.pl tcrHwy154Vars.vcf
Finished filtering tcrHwy154Vars.vcf
Retained 106246 variable loci
grep ^Sc filtered2x_tcrHwy154Vars.vcf | cut -f 8 | cut -f 1 -d ";" | perl -p -i -e 's/DP=//' > depth.txt
a<-scan("depth.txt")
#Read 106246 items
mean(a)
# 2199.679
sd(a)
# 1681.081
mean(a)+3 * sd(a)
# 7242.92
grep ^Sc filtered2x_tcrHwy154Vars.vcf | cut -f 1,2 | perl -p -i -e 's/:\S+//' | perl -p -i -e 's/^S[A-Za-z_]+//' > snpids1.txt
perl ../scripts/filterSomeMore.pl filtered2x_tcrHwy154Vars.vcf
checking neighbors for 106246 SNPs
Finished filtering filtered2x_tcrHwy154Vars.vcf
Retained 82993 variable loci
perl vcf2gl.pl 0.05 morefilter_filtered2x_tcrHwy154Vars.vcf
Number of loci: 32651; number of individuals 596
5. Inference of allele frequencies and generating the allele frequency files for Freddie (in freq_comb):
perl splitPops.pl ../variants_comb/*gl
No populations for these (not in the ID file from Freddie, I e-mailed him about this):
tcr-15_0187 tcr-15_0192 tcr-15_0193 tcr-15_0194 tcr-15_0195 tcr-15_0196 tcr-15_0199 tcr-15_0206 tcr-15_0215 tcr-15_0220 tcr-15_0221 tcr-15_0227 tcr-15_0230 tcr-15_0242 tcr-15_0271 tcr-15_0277 tcr-15_0281 tcr-15_0284 tcr-15_0295 tcr-15_0780 tcr-15_0782 tcr-15_0783 tcr-15_0785 tcr-15_0786 tcr-15_0788 tcr-15_0789 tcr-15_0790 tcr-15_0792 tcr-15_0795 tcr-15_0799 tcr-16_0359 tcr-16_p0101_1 tcr-16_p0102 tcr-16_p0108_1 tcr-16_p0109 tcr-16_p0114_1 tcr-16_p0114_2 tcr-16_p0115 tcr-16_p0187_1 tcr-16_p0187_2 tcr-16_p0187_3 tcr-16_p0187_4 tcr-16_p0187_5 tcr-16_p0187_6 tcr-16_p0187_7 tcr-16_p0188_1 tcr-16_p0189_1 tcr-16_p0305_1 tcr-16_p0306_1 tcr-16_p0306_2 tcr-16_p0307_1 tcr-16_p0308_1 tcr-16_p0308_2 tcr-16_p0309_1 tcr-16_p0309_2 tcr-16_p0309_3 tcr-16_p0309_4
perl runEstp.pl [A-Z].gl
estpEM -i O.gl -o p_O.txt -e 0.001 -m 20 -h 2
Reading data from O.gl
Number of loci: 32651
Number of individuals: 100
Using EM algorithm to estimate allele frequencies
Writing results to p_O.txt
Subset allele frequency data, see formatFile.R (below):
## genetic data
L<-32651
p<-matrix(NA,nrow=L,ncol=16)
files<-c("p_A.txt","p_B.txt","p_C.txt","p_D.txt","p_E.txt","p_F.txt","p_G.txt","p_H.txt","p_I.txt","p_J.txt","p_K.txt","p_L.txt","p_M.txt","p_N.txt","p_O.txt","p_P.txt")
for(i in 1:16){
x<-read.table(files[i],header=FALSE)
p[,i]<-as.numeric(x[,3])
}
snps<-read.table("snpids.txt",header=F)
## outfiles
## all snps
pops<-LETTERS[1:16]
N<-scan("sampleSizes.txt")
oN<-cbind(pops,N)
write.table(oN,"SampleSize.csv",sep=",",row.names=F,col.names=T,quote=F)
out1<-cbind(snps,p)
colnames(out1)<-c("lg","scaf","pos",pops)
write.table(out1,"tcrClineHwy154Gbs.csv",sep=",",row.names=F,col.names=T)
## subset, lg8
x<-which(out1$lg==8)
write.table(out1[x,],"tcrClineHwy154GbsLg8.csv",sep=",",row.names=F,col.names=T)
## scaffold 128
x<-which(out1$scaf==128)
write.table(out1[x,],"tcrClineHwy154GbsSc128.csv",sep=",",row.names=F,col.names=T)
## 100 snps from 128
xx<-sample(x,100,replace=FALSE)
write.table(out1[xx,],"tcrClineHwy154GbsSc128sub.csv",sep=",",row.names=F,col.names=T)
## 100 random snps (not on lg8)
x<-which(out1[,1]!=8)
xx<-sample(x,100,replace=FALSE)
write.table(out1[xx,],"tcrClineHwy154GbsRan100.csv",sep=",",row.names=F,col.names=T)
save(list=ls(),file="tcrClineHwy154.rdat")