STEPS TO DO ANALYSIS ON MAY 25TH 2016
Go to R and combine the bayesfactors from bayesfactor files and average allele frequencies( just use cbind)
Once the whole file is ready, run final_survdata_combine.py to combine survival data and bayesfactors and allele freqs into a master file. Repeat for weight.
Do glms between bf and af and survival effects and af to get residuals.
FILE LOG UPDATED ON MAY 2OTH 2016
Folder directory: /labs/evolution/projects/Lmelissa_host_adaptation/ExperimentalComparison
Files:
effectsSurvProbitTotal.txt
effectsWgtTotal.txt
final_data #file contains final bayesfactors and allele freq (all, west and east pop)
allpop_af #contains allele frequencies for all populations
bfmeansWithScafpositions_all, east, west - bayPass file for all, east and western populations bayesfactor means from all baypass runs with scaffold and positions
bf_survivaldata_combine.py #used to combine data with bayesnv data
bf_wghtdata_combine.py #used to combine data with bayesnv data
final_survdata_combine.py #main script for combining all the data
final_wghtdata_combine.py #main script for combining all the data
bayesfactormeans_all #bayesfactor averages for all runs
east_af-A,B,C,D allele frequencies from these runs #used fpr random forest
west_af- allele frequencies from these runs
east_bayesfactors #bayesfactor averages for all runs
west_bayesfactors #bayesfactors averages for all runs
survfile_tab.txt - Tabdelimited files for running the python script
wghtfile_tab.txt - same as above
Outcombine_survival_bfmeans.csv - output file from script with survival data and bf
Outcombine_wght_bfmeans.csv - output file from script with wght data and bf
Outcombine_surv_final.csv #FINAL FILE USED FOR ANALYSIS
Outcombine_wgt_final.csv #FINAL FILE USED FOR ANALYSIS
To replace single space with tab and replace missing with NA:
perl -pe 's/Missing/NA/g' Outcombine_wght_bfmeans.csv > Outcombine_wght_bfmeans_withNA.txt
#create a file with just snps which have bayesfactors.
s <- read.table("Outcombine_survival_bfmeans.txt", header=FALSE)
s.df <- data.frame(s)
test = s.df[!(is.na(s.df$bayesfactors)),] #excluding rows with missing bayesfactors
write.table(test, file = "survivaldatawithbfmeans.txt", quote = FALSE,row.names=FALSE )
WORK ON 10th Nov 2015
1). For effects of Survival:
Correlation test results with taking log10:
glaMS
Pearson's product-moment correlation
data: log10(x$glaMs) and log10(x$bayesfactors)
t = 10.9787, df = 35839, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.04757114 0.06820748
sample estimates:
cor
0.05789549
glaAc
Pearson's product-moment correlation
data: log10(x$glaAc) and log10(x$bayesfactors)
t = 11.2351, df = 37216, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.04800921 0.06825962
sample estimates:
cor
0.0581404
slaMs
Pearson's product-moment correlation
data: log10(x$slaMs) and log10(x$bayesfactors)
t = 8.0616, df = 36398, p-value = 8.882e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03195837 0.05246788
sample estimates:
cor
0.04221757
slaAc
Pearson's product-moment correlation
data: log10(x$slaAc) and log10(x$bayesfactors)
t = 5.0205, df = 33845, p-value = 5.181e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.01663082 0.03792190
sample estimates:
cor
0.02727945
Correlation with absolute values of glaMs and bayesfactors:
Pearson's product-moment correlation
data: abs(x$glaMs) and log10(x$bayesfactors)
t = 13.0506, df = 70925, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:ms
0.04160073 0.05628431
sample estimates:
cor
0.04894516
Cool result!!
Checked the association of survival effects with host-plant association. I used the following code and got the mean result as 0 which means the signal is strong and is highly correlated.
>x <- read.table("Outcombine_survival_bfmeans_withNA.txt", header=T)
> ms<-abs(x$glaMs)
> mstop<-which(ms > quantile(ms,probs=0.99,na.rm=T))
> mstop
> bftop<-which(bf > quantile(bf,probs=0.99,na.rm=T))
> sum(mstop %in% bftop)
[1] 146
> 0.01^2
[1] 1e-04
> 0.01^2 * dim(x)[1]
[1] 9.5216
> null<-rep(NA,10000)
> for(i in 1:10000){
+ n<-round(0.01*dim(x)[1],0)
+ msnull<-sample(1:dim(x)[1],n,replace=F)
+ bfnull<-sample(1:dim(x)[1],n,replace=F)
+ null[i]<-sum(msnull %in% bfnull)
+ }
> hist(null)
> abline(v=sum(mstop %in% bftop))
> mean(null >= 23)
[1] 0
Same mean for 1:1000
1). For effects of weight:
> cor.test(abs(x$glaMs), log10(x$bayesfactors))
Pearson's product-moment correlation
data: abs(x$glaMs) and log10(x$bayesfactors)
t = 2.7424, df = 68008, p-value = 0.006101
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.00299997 0.01802949
sample estimates:
cor
0.01051532
> cor.test(abs(x$glaAc), log10(x$bayesfactors))
Pearson's product-moment correlation
data: abs(x$glaAc) and log10(x$bayesfactors)
t = 8.0245, df = 68775, p-value = 8.882e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.02311597 0.03804912
sample estimates:
cor
0.03058425
> cor.test(abs(x$slaAc), log10(x$bayesfactors))
Pearson's product-moment correlation
data: abs(x$slaAc) and log10(x$bayesfactors)
t = 3.3839, df = 67625, p-value = 0.000715
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.005475207 0.020546340
sample estimates:
cor
0.01301151
> cor.test(abs(x$slaMs), log10(x$bayesfactors))
Pearson's product-moment correlation
data: abs(x$slaMs) and log10(x$bayesfactors)
t = 17.2457, df = 66747, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.05904751 0.07415270
sample estimates:
cor
0.06660392
Checked the association of weight effects with host-plant association. I used the following code and got the mean result as 0 which means the signal is strong and is highly correlated.
> ms <- abs(x$glaMs)
> bf <- log10(x$bayesfactors)
> mstop<-which(ms > quantile(ms,probs=0.99,na.rm=T))
> bftop<-which(bf > quantile(bf,probs=0.99,na.rm=T))
> sum(mstop %in% bftop)
[1] 23
> 0.01^2
[1] 1e-04
> 0.01^2 * dim(x)[1]
[1] 9.7081
> null<-rep(NA, 10000){
Error: unexpected '{' in "null<-rep(NA, 10000){"
> null<-rep(NA, 10000)
> for(i in 1:10000){
+ n<-round(0.01*dim(x)[1],0)
+ msnull<-sample(1:dim(x)[1],n,replace=F)
+ bfnull<-sample(1:dim(x)[1],n,replace=F)
+ null[i]<-sum(msnull %in% bfnull)
+ }
> mean(null >= 23)
[1] 4e-04