python analysesSnpAnnot_combine.py snp_annotation_outtable.txt clines_popanc3.txt comb3_clines_popanc_annot.txt
python analysesSnpAnnot_combine.py snp_annotation_outtable.txt clines_popanc1.txt comb1_clines_popanc_annot.txt
### aims 2 ####
python analysesSnpAnnot_combine.py snp_annotation_outtable.txt clines_popanc2.txt comb2_clines_popanc_annot.txt
### aims 3 ####
### fix the script before doing this because there are 3 extra columns
Step 1: Use the create_snp_annotations.py script to make the snp_annotation file
Directory: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/dovetail_melissa_genome/Annotation/maker/melissa_round2.maker.output/snp_annotation
I did SNP annotation on my computer using my python script (create_snp_annotations.py). This script created the master SNP annotation file which contains functional and structural annotation information. I will use this script to do the downstream statistical analyses (mainly randomizations).
Given a snp list file with scaffold and positions, and a genome annotation file (from MAKER), this script creates a SNP annotation file. Before giving the genome annotation file from maker (maker_ipr_go.txt, ), I removed the lines which had sequence data in this file and created the file genome_annotation.txt (lines=2310823). I used the mappos.txt (39193 SNPs) file as a list of SNPs with scaffold and position in the alignment from the hybrid zone data.
Usage of python script:
python create_snp_annotations.py --map mappos.txt --ann genome_annotation.txt --out snp_annotation_outtable.txt
I then moved the input, output files, script and dummy data to the folder: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/dovetail_melissa_genome/Annotation/maker/melissa_round2.maker.output/snp_annotation
Step 2: Combine the Clines, popanc files with cutoff AIMS snps with the SNP annotation file.
I first combined the clines and popanc files to create one file with all the hybrid zone analyses information.
################## read in popanc and take means ##############################
#### aims 1 ############
bcr1<-read.table("../../popanc/outfiles/aims/out_aims1_BCR.txt", header=T, sep=",")
bic1<-read.table("../../popanc/outfiles/aims/out_aims1_BIC.txt", header=T, sep=",")
bld1<-read.table("../../popanc/outfiles/aims/out_aims1_BLD.txt", header=T, sep=",")
bnp1<-read.table("../../popanc/outfiles/aims/out_aims1_BNP.txt", header=T, sep=",")
btb1<-read.table("../../popanc/outfiles/aims/out_aims1_BTB.txt", header=T, sep=",")
frc1<-read.table("../../popanc/outfiles/aims/out_aims1_FRC.txt", header=T, sep=",")
pin1<-read.table("../../popanc/outfiles/aims/out_aims1_PIN.txt", header=T, sep=",")
psp1<-read.table("../../popanc/outfiles/aims/out_aims1_PSP.txt", header=T, sep=",")
rdl1<-read.table("../../popanc/outfiles/aims/out_aims1_RDL.txt", header=T, sep=",")
rnv1<-read.table("../../popanc/outfiles/aims/out_aims1_RNV.txt", header=T, sep=",")
#read in mappos file ####
mappos1<-read.table("../../popanc/infiles/aims/mappos1.txt", header=F, sep=":")
## take ancestry means across populations ####
ancmean1<-(bcr1$mean + bic1$mean + bld1$mean + bnp1$mean + btb1$mean + frc1$mean + rdl1$mean + rnv1$mean + pin1$mean + psp1$mean) / 10
## folding the values ####
ancmean_fold1<-ancmean1
ancmean_fold1[ancmean_fold1 < 0.5]<-1-ancmean_fold1[ancmean_fold1 < 0.5]
## combine stuff
anc1<-cbind(mappos1, ancmean1)
colnames(anc1)<-c("scaffold","position","ancmean")
## write to file
write.table(anc1, "aims1_ancestry.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)
#### aims 2 #############
bcr2<-read.table("../../popanc/outfiles/aims/out_aims2_BCR.txt", header=T, sep=",")
bic2<-read.table("../../popanc/outfiles/aims/out_aims2_BIC.txt", header=T, sep=",")
bld2<-read.table("../../popanc/outfiles/aims/out_aims2_BLD.txt", header=T, sep=",")
bnp2<-read.table("../../popanc/outfiles/aims/out_aims2_BNP.txt", header=T, sep=",")
btb2<-read.table("../../popanc/outfiles/aims/out_aims2_BTB.txt", header=T, sep=",")
frc2<-read.table("../../popanc/outfiles/aims/out_aims2_FRC.txt", header=T, sep=",")
pin2<-read.table("../../popanc/outfiles/aims/out_aims2_PIN.txt", header=T, sep=",")
psp2<-read.table("../../popanc/outfiles/aims/out_aims2_PSP.txt", header=T, sep=",")
rdl2<-read.table("../../popanc/outfiles/aims/out_aims2_RDL.txt", header=T, sep=",")
rnv2<-read.table("../../popanc/outfiles/aims/out_aims2_RNV.txt", header=T, sep=",")
#read in mappos file ####
mappos2<-read.table("../../popanc/infiles/aims/mappos2.txt", header=F, sep=":")
## take ancestry means across populations ####
ancmean2<-(bcr2$mean + bic2$mean + bld2$mean + bnp2$mean + btb2$mean + frc2$mean + rdl2$mean + rnv2$mean + pin2$mean + psp2$mean) / 20
## folding the values ####
ancmean_fold2<-ancmean2
ancmean_fold2[ancmean_fold2 < 0.5]<-2-ancmean_fold2[ancmean_fold2 < 0.5]
## combine stuff
anc2<-cbind(mappos2, ancmean2)
colnames(anc2)<-c("scaffold","position","ancmean")
## write to file
write.table(anc2, "aims2_ancestry.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)
#### aims 3 ############
bcr3<-read.table("../../popanc/outfiles/aims/out_aims3_BCR.txt", header=T, sep=",")
bic3<-read.table("../../popanc/outfiles/aims/out_aims3_BIC.txt", header=T, sep=",")
bld3<-read.table("../../popanc/outfiles/aims/out_aims3_BLD.txt", header=T, sep=",")
bnp3<-read.table("../../popanc/outfiles/aims/out_aims3_BNP.txt", header=T, sep=",")
btb3<-read.table("../../popanc/outfiles/aims/out_aims3_BTB.txt", header=T, sep=",")
frc3<-read.table("../../popanc/outfiles/aims/out_aims3_FRC.txt", header=T, sep=",")
pin3<-read.table("../../popanc/outfiles/aims/out_aims3_PIN.txt", header=T, sep=",")
psp3<-read.table("../../popanc/outfiles/aims/out_aims3_PSP.txt", header=T, sep=",")
rdl3<-read.table("../../popanc/outfiles/aims/out_aims3_RDL.txt", header=T, sep=",")
rnv3<-read.table("../../popanc/outfiles/aims/out_aims3_RNV.txt", header=T, sep=",")
#read in mappos file ####
mappos3<-read.table("../../popanc/infiles/aims/mappos3.txt", header=F, sep=":")
## take ancestry means across populations ####
ancmean3<-(bcr3$mean + bic3$mean + bld3$mean + bnp3$mean + btb3$mean + frc3$mean + rdl3$mean + rnv3$mean + pin3$mean + psp3$mean) / 10
## folding the values ####
ancmean_fold3<-ancmean3
ancmean_fold3[ancmean_fold3 < 0.5]<-1-ancmean_fold3[ancmean_fold3 < 0.5]
## combine stuff
anc3<-cbind(mappos3, ancmean3, ancmean_fold3)
colnames(anc3)<-c("scaffold","position","ancmean","ancmeanfold")
## write to file
write.table(anc3, "aims3_ancestry.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)
### pct 3 ###
pct3alpha<-read.table("../../clines/outfiles/bgcout_pct3_0.hdf5_alpha.txt", header = F, sep = ",")
pct3beta<-read.table("../../clines/outfiles/bgcout_pct3_0.hdf5_beta.txt", header = F, sep = ",")
cline3med<-read.table("../../clines/outfiles/pct3_alphabeta_mcmcmedian.txt", header=TRUE)
cline3<-cbind(mappos3, pct3alpha[,1], pct3beta[,1], cline3med)
colnames(cline3)<-c("scaffold","position","alpha","beta","alpha_med", "beta_med")
write.table(cline3, "../clines/clines_pct3.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)
######## combine the clines and popanc files ##############################################
## clines files #####
pct1<-read.table("clines/clines_pct1.txt", header=T)
pct2<-read.table("clines/clines_pct2.txt", header=T)
pct3<-read.table("clines/clines_pct3.txt", header=T)
### popanc files ####
aims1<-read.table("popanc/aims1_ancestry.txt", header=T)
aims2<-read.table("popanc/aims2_ancestry.txt", header=T)
aims3<-read.table("popanc/aims3_ancestry.txt", header=T)
### combine the files ###
comb1<-cbind(pct1,aims1$ancmean)
colnames(comb1)<-c("scaffold","position","alpha","beta","ancmean")
comb2<-cbind(pct2,aims2$ancmean)
colnames(comb2)<-c("scaffold","position","alpha","beta","ancmean")
comb3<-cbind(pct3$scaffold, pct3$position, pct3$alpha, pct3$beta, aims3$ancmean)
colnames(comb3)<-c("scaffold","position","alpha","beta","ancmean")
## this file is for predictability ###
comb3_fold<-cbind(pct3$scaffold, pct3$position, pct3$alpha_med, pct3$beta_med, aims3$ancmeanfold)
colnames(comb3_fold)<-c("scaffold","position","alphamed","betamed","ancfold")
#write out the files
write.table(comb1, file = "clines_popanc1.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)
write.table(comb2, file = "clines_popanc2.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)
write.table(comb3, file = "clines_popanc3.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)
## this file is for predictability ###
write.table(comb3_fold, file = "clines_popanc3_fold.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)
After creating the annotation for the fold file, I went into R and combined the ancmeans from the comb3 file to the fold file.
pct3<-read.table("comb3_clines_popanc_annot.txt", header=T, na.strings=c("","NA"), sep="\t")
pct3_fold<-read.table("comb3_clines_popanc_fold_annot.txt", header=T, na.strings=c("","NA"), sep="\t")
new<-cbind(pct3_fold, pct3[,5])
write.table(new, "comb3_clines_popanc_fold_annot.txt", col.names=T, row.names=FALSE, sep="\t", quote=F)
## then I opened the file in vim and changed the column names
### On command line ###
Use python script to analysesSnpAnnot_combine.py to combine the hybrid zone analyses file from above with annotation file.
Script usage: python analysesSnpAnnot_combine.py snp_annotation_outtable.txt analyses_file.txt outfile.txt
Commands I ran:
### aims 1 ####