structural_annotation

python analysesSnpAnnot_combine.py snp_annotation_outtable.txt clines_popanc3.txt comb3_clines_popanc_annot.txt

python analysesSnpAnnot_combine.py snp_annotation_outtable.txt clines_popanc1.txt comb1_clines_popanc_annot.txt

### aims 2 ####

python analysesSnpAnnot_combine.py snp_annotation_outtable.txt clines_popanc2.txt comb2_clines_popanc_annot.txt

### aims 3 ####

### fix the script before doing this because there are 3 extra columns

Step 1: Use the create_snp_annotations.py script to make the snp_annotation file

Directory: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/dovetail_melissa_genome/Annotation/maker/melissa_round2.maker.output/snp_annotation

I did SNP annotation on my computer using my python script (create_snp_annotations.py). This script created the master SNP annotation file which contains functional and structural annotation information. I will use this script to do the downstream statistical analyses (mainly randomizations).

Given a snp list file with scaffold and positions, and a genome annotation file (from MAKER), this script creates a SNP annotation file. Before giving the genome annotation file from maker (maker_ipr_go.txt, ), I removed the lines which had sequence data in this file and created the file genome_annotation.txt (lines=2310823). I used the mappos.txt (39193 SNPs) file as a list of SNPs with scaffold and position in the alignment from the hybrid zone data.

Usage of python script:

python create_snp_annotations.py --map mappos.txt --ann genome_annotation.txt --out snp_annotation_outtable.txt

I then moved the input, output files, script and dummy data to the folder: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/dovetail_melissa_genome/Annotation/maker/melissa_round2.maker.output/snp_annotation

Step 2: Combine the Clines, popanc files with cutoff AIMS snps with the SNP annotation file.

I first combined the clines and popanc files to create one file with all the hybrid zone analyses information.

################## read in popanc and take means ##############################

#### aims 1 ############

bcr1<-read.table("../../popanc/outfiles/aims/out_aims1_BCR.txt", header=T, sep=",")

bic1<-read.table("../../popanc/outfiles/aims/out_aims1_BIC.txt", header=T, sep=",")

bld1<-read.table("../../popanc/outfiles/aims/out_aims1_BLD.txt", header=T, sep=",")

bnp1<-read.table("../../popanc/outfiles/aims/out_aims1_BNP.txt", header=T, sep=",")

btb1<-read.table("../../popanc/outfiles/aims/out_aims1_BTB.txt", header=T, sep=",")

frc1<-read.table("../../popanc/outfiles/aims/out_aims1_FRC.txt", header=T, sep=",")

pin1<-read.table("../../popanc/outfiles/aims/out_aims1_PIN.txt", header=T, sep=",")

psp1<-read.table("../../popanc/outfiles/aims/out_aims1_PSP.txt", header=T, sep=",")

rdl1<-read.table("../../popanc/outfiles/aims/out_aims1_RDL.txt", header=T, sep=",")

rnv1<-read.table("../../popanc/outfiles/aims/out_aims1_RNV.txt", header=T, sep=",")

#read in mappos file ####

mappos1<-read.table("../../popanc/infiles/aims/mappos1.txt", header=F, sep=":")

## take ancestry means across populations ####

ancmean1<-(bcr1$mean + bic1$mean + bld1$mean + bnp1$mean + btb1$mean + frc1$mean + rdl1$mean + rnv1$mean + pin1$mean + psp1$mean) / 10

## folding the values ####

ancmean_fold1<-ancmean1

ancmean_fold1[ancmean_fold1 < 0.5]<-1-ancmean_fold1[ancmean_fold1 < 0.5]

## combine stuff

anc1<-cbind(mappos1, ancmean1)

colnames(anc1)<-c("scaffold","position","ancmean")

## write to file

write.table(anc1, "aims1_ancestry.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)

#### aims 2 #############

bcr2<-read.table("../../popanc/outfiles/aims/out_aims2_BCR.txt", header=T, sep=",")

bic2<-read.table("../../popanc/outfiles/aims/out_aims2_BIC.txt", header=T, sep=",")

bld2<-read.table("../../popanc/outfiles/aims/out_aims2_BLD.txt", header=T, sep=",")

bnp2<-read.table("../../popanc/outfiles/aims/out_aims2_BNP.txt", header=T, sep=",")

btb2<-read.table("../../popanc/outfiles/aims/out_aims2_BTB.txt", header=T, sep=",")

frc2<-read.table("../../popanc/outfiles/aims/out_aims2_FRC.txt", header=T, sep=",")

pin2<-read.table("../../popanc/outfiles/aims/out_aims2_PIN.txt", header=T, sep=",")

psp2<-read.table("../../popanc/outfiles/aims/out_aims2_PSP.txt", header=T, sep=",")

rdl2<-read.table("../../popanc/outfiles/aims/out_aims2_RDL.txt", header=T, sep=",")

rnv2<-read.table("../../popanc/outfiles/aims/out_aims2_RNV.txt", header=T, sep=",")

#read in mappos file ####

mappos2<-read.table("../../popanc/infiles/aims/mappos2.txt", header=F, sep=":")

## take ancestry means across populations ####

ancmean2<-(bcr2$mean + bic2$mean + bld2$mean + bnp2$mean + btb2$mean + frc2$mean + rdl2$mean + rnv2$mean + pin2$mean + psp2$mean) / 20

## folding the values ####

ancmean_fold2<-ancmean2

ancmean_fold2[ancmean_fold2 < 0.5]<-2-ancmean_fold2[ancmean_fold2 < 0.5]

## combine stuff

anc2<-cbind(mappos2, ancmean2)

colnames(anc2)<-c("scaffold","position","ancmean")

## write to file

write.table(anc2, "aims2_ancestry.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)

#### aims 3 ############

bcr3<-read.table("../../popanc/outfiles/aims/out_aims3_BCR.txt", header=T, sep=",")

bic3<-read.table("../../popanc/outfiles/aims/out_aims3_BIC.txt", header=T, sep=",")

bld3<-read.table("../../popanc/outfiles/aims/out_aims3_BLD.txt", header=T, sep=",")

bnp3<-read.table("../../popanc/outfiles/aims/out_aims3_BNP.txt", header=T, sep=",")

btb3<-read.table("../../popanc/outfiles/aims/out_aims3_BTB.txt", header=T, sep=",")

frc3<-read.table("../../popanc/outfiles/aims/out_aims3_FRC.txt", header=T, sep=",")

pin3<-read.table("../../popanc/outfiles/aims/out_aims3_PIN.txt", header=T, sep=",")

psp3<-read.table("../../popanc/outfiles/aims/out_aims3_PSP.txt", header=T, sep=",")

rdl3<-read.table("../../popanc/outfiles/aims/out_aims3_RDL.txt", header=T, sep=",")

rnv3<-read.table("../../popanc/outfiles/aims/out_aims3_RNV.txt", header=T, sep=",")

#read in mappos file ####

mappos3<-read.table("../../popanc/infiles/aims/mappos3.txt", header=F, sep=":")

## take ancestry means across populations ####

ancmean3<-(bcr3$mean + bic3$mean + bld3$mean + bnp3$mean + btb3$mean + frc3$mean + rdl3$mean + rnv3$mean + pin3$mean + psp3$mean) / 10

## folding the values ####

ancmean_fold3<-ancmean3

ancmean_fold3[ancmean_fold3 < 0.5]<-1-ancmean_fold3[ancmean_fold3 < 0.5]

## combine stuff

anc3<-cbind(mappos3, ancmean3, ancmean_fold3)

colnames(anc3)<-c("scaffold","position","ancmean","ancmeanfold")

## write to file

write.table(anc3, "aims3_ancestry.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)

### pct 3 ###

pct3alpha<-read.table("../../clines/outfiles/bgcout_pct3_0.hdf5_alpha.txt", header = F, sep = ",")

pct3beta<-read.table("../../clines/outfiles/bgcout_pct3_0.hdf5_beta.txt", header = F, sep = ",")

cline3med<-read.table("../../clines/outfiles/pct3_alphabeta_mcmcmedian.txt", header=TRUE)

cline3<-cbind(mappos3, pct3alpha[,1], pct3beta[,1], cline3med)

colnames(cline3)<-c("scaffold","position","alpha","beta","alpha_med", "beta_med")

write.table(cline3, "../clines/clines_pct3.txt", sep="\t", quote=FALSE, col.names=TRUE, row.names=FALSE)

######## combine the clines and popanc files ##############################################

## clines files #####

pct1<-read.table("clines/clines_pct1.txt", header=T)

pct2<-read.table("clines/clines_pct2.txt", header=T)

pct3<-read.table("clines/clines_pct3.txt", header=T)

### popanc files ####

aims1<-read.table("popanc/aims1_ancestry.txt", header=T)

aims2<-read.table("popanc/aims2_ancestry.txt", header=T)

aims3<-read.table("popanc/aims3_ancestry.txt", header=T)

### combine the files ###

comb1<-cbind(pct1,aims1$ancmean)

colnames(comb1)<-c("scaffold","position","alpha","beta","ancmean")

comb2<-cbind(pct2,aims2$ancmean)

colnames(comb2)<-c("scaffold","position","alpha","beta","ancmean")

comb3<-cbind(pct3$scaffold, pct3$position, pct3$alpha, pct3$beta, aims3$ancmean)

colnames(comb3)<-c("scaffold","position","alpha","beta","ancmean")

## this file is for predictability ###

comb3_fold<-cbind(pct3$scaffold, pct3$position, pct3$alpha_med, pct3$beta_med, aims3$ancmeanfold)

colnames(comb3_fold)<-c("scaffold","position","alphamed","betamed","ancfold")

#write out the files

write.table(comb1, file = "clines_popanc1.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)

write.table(comb2, file = "clines_popanc2.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)

write.table(comb3, file = "clines_popanc3.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)

## this file is for predictability ###

write.table(comb3_fold, file = "clines_popanc3_fold.txt", col.names=TRUE, row.names=FALSE, sep="\t", quote=FALSE)

After creating the annotation for the fold file, I went into R and combined the ancmeans from the comb3 file to the fold file.

pct3<-read.table("comb3_clines_popanc_annot.txt", header=T, na.strings=c("","NA"), sep="\t")

pct3_fold<-read.table("comb3_clines_popanc_fold_annot.txt", header=T, na.strings=c("","NA"), sep="\t")

new<-cbind(pct3_fold, pct3[,5])

write.table(new, "comb3_clines_popanc_fold_annot.txt", col.names=T, row.names=FALSE, sep="\t", quote=F)

## then I opened the file in vim and changed the column names

### On command line ###

Use python script to analysesSnpAnnot_combine.py to combine the hybrid zone analyses file from above with annotation file.

Script usage: python analysesSnpAnnot_combine.py snp_annotation_outtable.txt analyses_file.txt outfile.txt

Commands I ran:

### aims 1 ####

Page updated

Google Sites

Report abuse