Post date: Jul 31, 2020 6:59:10 PM
Step 1 - download the bam files for Pando only
scp -r u6028866@kingspeak.chpc.utah.edu:/uufs/chpc.utah.edu/common/home/gompert-group1/data/aspen/gbs_pando_plus/Alignments_mem_mod/aln_mem_mod_[017-S,025-B,025-S,027-B,037-B-rep,037-B-Rp,037-B,039-S,047-B,049-S,051-B,051-S,053-B,053-S,055-S-Rp,055-S,063-S,065-B,065-S,069-B,069-S,071-B,073-S,075-B,075-S,083-S,085-B,085-S,086-S,087-B,087-S,091-B,091-S,093-B,093-S,095-B,095-S,103-B,103-S,105-B,107-B,107-S,111-B,113-B,113-S,121-B,121-S,123-B,123-S,125-B,129-B,129-S-rep,129-S-Rp,129-S,131-S,137-B,139-B,141-B,143-S,145-S,147-S,155-B,155-S,157-B,157-S,159-B,159-S,163-B,163-S,165-B,165-S,171-B,171-S,173-B,173-S,175-B,175-S,177-S,179-B,179-S,181-S,191-S,195-B,195-S,203-S,205-S,207-S,209-B,211-B,211-S,219-S,221-S,223-B,225-S,235-B,235-S,237-B,237-S,239-B,239-S,247-S,251-B,251-S,257-S,259-B,267-B,269-S,277-B,277-S]*.sorted.bam /Users/rpineau/Documents/Year_2/Summer2020/Research/data/Pando/bam/aln_mem_mod_[017-S,025-B,025-S,027-B,037-B-rep,037-B-Rp,037-B,039-S,047-B,049-S,051-B,051-S,053-B,053-S,055-S-Rp,055-S,063-S,065-B,065-S,069-B,069-S,071-B,073-S,075-B,075-S,083-S,085-B,085-S,086-S,087-B,087-S,091-B,091-S,093-B,093-S,095-B,095-S,103-B,103-S,105-B,107-B,107-S,111-B,113-B,113-S,121-B,121-S,123-B,123-S,125-B,129-B,129-S-rep,129-S-Rp,129-S,131-S,137-B,139-B,141-B,143-S,145-S,147-S,155-B,155-S,157-B,157-S,159-B,159-S,163-B,163-S,165-B,165-S,171-B,171-S,173-B,173-S,175-B,175-S,177-S,179-B,179-S,181-S,191-S,195-B,195-S,203-S,205-S,207-S,209-B,211-B,211-S,219-S,221-S,223-B,225-S,235-B,235-S,237-B,237-S,239-B,239-S,247-S,251-B,251-S,257-S,259-B,267-B,269-S,277-B,277-S]*.sorted.bam
017-S,025-B,025-S,027-B,037-B-rep,037-B-Rp,037-B,039-S,047-B,049-S,051-B,051-S,053-B,053-S,055-S-Rp,055-S,063-S,065-B,065-S,069-B,069-S,071-B,073-S,075-B,075-S,083-S,085-B,085-S,086-S,087-B,087-S,091-B,091-S,093-B,093-S,095-B,095-S,103-B,103-S,105-B,107-B,107-S,111-B,113-B,113-S,121-B,121-S,123-B,123-S,125-B,129-B,129-S-rep,129-S-Rp,129-S,131-S,137-B,139-B,141-B,143-S,145-S,147-S,155-B,155-S,157-B,157-S,159-B,159-S,163-B,163-S,165-B,165-S,171-B,171-S,173-B,173-S,175-B,175-S,177-S,179-B,179-S,181-S,191-S,195-B,195-S,203-S,205-S,207-S,209-B,211-B,211-S,219-S,221-S,223-B,225-S,235-B,235-S,237-B,237-S,239-B,239-S,247-S,251-B,251-S,257-S,259-B,267-B,269-S,277-B,277-S
did not work - I chose to download the whole folder and filter from there.
scp -r u6028866@kingspeak.chpc.utah.edu:/uufs/chpc.utah.edu/common/home/gompert-group1/data/aspen/gbs_pando_plus/Alignments_mem_mod/aln_mem_mod_103-S.sorted.bam /Users/rpineau/Documents/Year_2/Summer2020/Research/data/Pando/bam/pando_only
Individual per individual!
scp -r u6028866@kingspeak.chpc.utah.edu:/uufs/chpc.utah.edu/common/home/gompert-group1/data/aspen/gbs_pando_plus/Alignments_mem_mod/aln_mem_mod_129-S-Rp.sorted.bam /Users/rpineau/Documents/Year_2/Summer2020/Research/data/Pando/bam/pando_only
Wouw I am finally done... This was really silly and uninteresting.
I have 109 bam files.
Attention! Here, I consider the ancestor being the reference genome...
First step is to make a list of them:
for i in pando_only/*.bam;do samtools index $i;done
Make a file containing the list of the bam files
ls pando_only/*.bam > bam.filelist
Download the genome:
scp u6028866@kingspeak.chpc.utah.edu:/uufs/chpc.utah.edu/common/home/gompert-group1/data/aspen/genome/Potrs01-genome.fa
/Users/rpineau/Documents/Year_2/Summer2020/Research/data/Pando/bam/
Index the genome:
samtools faidx Potrs01-genome.fa
Calculate a first estimate for the Site Frequency Spectrum
/Applications/angsd0.930/angsd/angsd -bam bam.filelist -doSaf 1 -anc Potrs01-genome.fa -GL 1 -P 24 -minMapQ 1 -minQ 20 -out sfs_est
>Potrs164785
(takes a long time)(Maybe I would need to filter the genome to only tae into account the sites that were kept for GBS?)
164 785 pieces in the genome! It will soon be one day of running!
Do the same steps with only 4 individuals!
Calculate a first estimate for the Site Frequency Spectrum
/Applications/angsd0.930/angsd/angsd -bam test.filelist -doSaf 1 -anc Potrs01-genome.fa -GL 1 -P 24 -minMapQ 1 -minQ 20 -out test
-> Done reading data waiting for calculations to finish
-> Done waiting for threads
-> Output filenames:
->"test.arg"
->"test.saf.gz"
->"test.saf.pos.gz"
->"test.saf.idx"
-> Mon Aug 3 03:55:35 2020
-> Arguments and parameters for all analysis are located in .arg file
-> Total number of sites analyzed: 12167439
-> Number of sites retained after filtering: 11982146
[ALL done] cpu-time used = 151.87 sec
[ALL done] walltime used = 65203.00 sec
List of files: test.arg test.list test.saf.gz test.saf.idx test.saf.pos.gz
/Applications/angsd0.930/angsd/misc/realSFS test.saf.idx -P 24 > test.sfs
message:
-> hello Im the master merge part of realSFS. and I'll now do a tripple bypass to find intersect
-> 1) Will set iter according to chooseChr and start and stop, and possibly using -sites
-> Only read nSites: 8896738 will therefore prepare next chromosome (or exit)
------------
startlik=-17102982.340757
lik[2]=-1730851.120161 diff=1.537213e+07 alpha:1.000000 sr2:3.745485e-01
lik[5]=-1207972.040705 diff=5.228791e+05 alpha:1.410231 sr2:3.085613e-03
lik[8]=-1204375.590507 diff=3.596450e+03 alpha:1.959322 sr2:3.412403e-07
lik[11]=-1204342.901405 diff=3.268910e+01 alpha:2.746131 sr2:4.435865e-10
lik[14]=-1204342.643548 diff=2.578564e-01 alpha:1.865573 sr2:4.649494e-12
-> Breaking EM(sr2) at iter:15, sqrt(sr2):4.432026e-07
likelihood: -1204342.643548
------------
-> Only read nSites: 0 will therefore prepare next chromosome (or exit)
-> NB NB output is no longer log probs of the frequency spectrum!
-> Output is now simply the expected values!
-> You can convert to the old format simply with log(norm(x))
Plot the SFS in R:
s<-scan('out.sfs') s<-s[-c(1,length(s))] s<-s/sum(s) barplot(s,names=1:length(s),main='SFS')
Calculate the Thetas for each site:
/Applications/angsd0.930/angsd/misc/realSFS saf2theta test.saf.idx -sfs test.sfs -outname out
-> Output filenames:
->"out.thetas.gz"
->"out.thetas.idx"
#calculate Tajimas D /Applications/angsd0.930/angsd/misc/thetaStat do_stat out.thetas.idx
Dumping file: "out.thetas.idx.pestPG"
This file contains the info about the number of sites!