experimental comparison

STEPS TO DO ANALYSIS ON MAY 25TH 2016

Go to R and combine the bayesfactors from bayesfactor files and average allele frequencies( just use cbind)
Once the whole file is ready, run final_survdata_combine.py to combine survival data and bayesfactors and allele freqs into a master file. Repeat for weight.
Do glms between bf and af and survival effects and af to get residuals.

FILE LOG UPDATED ON MAY 2OTH 2016

Folder directory: /labs/evolution/projects/Lmelissa_host_adaptation/ExperimentalComparison

Files:

effectsSurvProbitTotal.txt
effectsWgtTotal.txt
final_data #file contains final bayesfactors and allele freq (all, west and east pop)
allpop_af #contains allele frequencies for all populations
bfmeansWithScafpositions_all, east, west - bayPass file for all, east and western populations bayesfactor means from all baypass runs with scaffold and positions
bf_survivaldata_combine.py #used to combine data with bayesnv data
bf_wghtdata_combine.py #used to combine data with bayesnv data
final_survdata_combine.py #main script for combining all the data
final_wghtdata_combine.py #main script for combining all the data
bayesfactormeans_all #bayesfactor averages for all runs
east_af-A,B,C,D allele frequencies from these runs #used fpr random forest
west_af- allele frequencies from these runs
east_bayesfactors #bayesfactor averages for all runs
west_bayesfactors #bayesfactors averages for all runs
survfile_tab.txt - Tabdelimited files for running the python script
wghtfile_tab.txt - same as above
Outcombine_survival_bfmeans.csv - output file from script with survival data and bf
Outcombine_wght_bfmeans.csv - output file from script with wght data and bf
Outcombine_surv_final.csv #FINAL FILE USED FOR ANALYSIS
Outcombine_wgt_final.csv #FINAL FILE USED FOR ANALYSIS

To replace single space with tab and replace missing with NA:

perl -pe 's/Missing/NA/g' Outcombine_wght_bfmeans.csv > Outcombine_wght_bfmeans_withNA.txt

#create a file with just snps which have bayesfactors.

s <- read.table("Outcombine_survival_bfmeans.txt", header=FALSE)

s.df <- data.frame(s)

test = s.df[!(is.na(s.df$bayesfactors)),] #excluding rows with missing bayesfactors

write.table(test, file = "survivaldatawithbfmeans.txt", quote = FALSE,row.names=FALSE )

WORK ON 10th Nov 2015

1). For effects of Survival:

Correlation test results with taking log10:

glaMS

Pearson's product-moment correlation

data: log10(x$glaMs) and log10(x$bayesfactors)

t = 10.9787, df = 35839, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.04757114 0.06820748

sample estimates:

cor

0.05789549

glaAc

Pearson's product-moment correlation

data: log10(x$glaAc) and log10(x$bayesfactors)

t = 11.2351, df = 37216, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.04800921 0.06825962

sample estimates:

cor

0.0581404

slaMs

Pearson's product-moment correlation

data: log10(x$slaMs) and log10(x$bayesfactors)

t = 8.0616, df = 36398, p-value = 8.882e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.03195837 0.05246788

sample estimates:

cor

0.04221757

slaAc

Pearson's product-moment correlation

data: log10(x$slaAc) and log10(x$bayesfactors)

t = 5.0205, df = 33845, p-value = 5.181e-07

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.01663082 0.03792190

sample estimates:

cor

0.02727945

Correlation with absolute values of glaMs and bayesfactors:

Pearson's product-moment correlation

data: abs(x$glaMs) and log10(x$bayesfactors)

t = 13.0506, df = 70925, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:ms

0.04160073 0.05628431

sample estimates:

cor

0.04894516

Cool result!!

Checked the association of survival effects with host-plant association. I used the following code and got the mean result as 0 which means the signal is strong and is highly correlated.

>x <- read.table("Outcombine_survival_bfmeans_withNA.txt", header=T)

> ms<-abs(x$glaMs)

> mstop<-which(ms > quantile(ms,probs=0.99,na.rm=T))

> mstop

> bftop<-which(bf > quantile(bf,probs=0.99,na.rm=T))

> sum(mstop %in% bftop)

[1] 146

> 0.01^2

[1] 1e-04

> 0.01^2 * dim(x)[1]

[1] 9.5216

> null<-rep(NA,10000)

> for(i in 1:10000){

+ n<-round(0.01*dim(x)[1],0)

+ msnull<-sample(1:dim(x)[1],n,replace=F)

+ bfnull<-sample(1:dim(x)[1],n,replace=F)

+ null[i]<-sum(msnull %in% bfnull)

+ }

> hist(null)

> abline(v=sum(mstop %in% bftop))

> mean(null >= 23)

[1] 0

Same mean for 1:1000

1). For effects of weight:

> cor.test(abs(x$glaMs), log10(x$bayesfactors))

Pearson's product-moment correlation

data: abs(x$glaMs) and log10(x$bayesfactors)

t = 2.7424, df = 68008, p-value = 0.006101

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.00299997 0.01802949

sample estimates:

cor

0.01051532

> cor.test(abs(x$glaAc), log10(x$bayesfactors))

Pearson's product-moment correlation

data: abs(x$glaAc) and log10(x$bayesfactors)

t = 8.0245, df = 68775, p-value = 8.882e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.02311597 0.03804912

sample estimates:

cor

0.03058425

> cor.test(abs(x$slaAc), log10(x$bayesfactors))

Pearson's product-moment correlation

data: abs(x$slaAc) and log10(x$bayesfactors)

t = 3.3839, df = 67625, p-value = 0.000715

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.005475207 0.020546340

sample estimates:

cor

0.01301151

> cor.test(abs(x$slaMs), log10(x$bayesfactors))

Pearson's product-moment correlation

data: abs(x$slaMs) and log10(x$bayesfactors)

t = 17.2457, df = 66747, p-value < 2.2e-16

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.05904751 0.07415270

sample estimates:

cor

0.06660392

Checked the association of weight effects with host-plant association. I used the following code and got the mean result as 0 which means the signal is strong and is highly correlated.

> ms <- abs(x$glaMs)

> bf <- log10(x$bayesfactors)

> mstop<-which(ms > quantile(ms,probs=0.99,na.rm=T))

> bftop<-which(bf > quantile(bf,probs=0.99,na.rm=T))

> sum(mstop %in% bftop)

[1] 23

> 0.01^2

[1] 1e-04

> 0.01^2 * dim(x)[1]

[1] 9.7081

> null<-rep(NA, 10000){

Error: unexpected '{' in "null<-rep(NA, 10000){"

> null<-rep(NA, 10000)

> for(i in 1:10000){

+ n<-round(0.01*dim(x)[1],0)

+ msnull<-sample(1:dim(x)[1],n,replace=F)

+ bfnull<-sample(1:dim(x)[1],n,replace=F)

+ null[i]<-sum(msnull %in% bfnull)

+ }

> mean(null >= 23)

[1] 4e-04