Post date: Jul 28, 2020 9:44:44 PM
http://www.popgen.dk/angsd/index.php/ANGSD
Input data:
BCF/VCF file as input is now included but with some limitations. Only chr,pos and PL tags are being used, and we discard indels.
command:
./angsd -vcf-gl [filename.vcf]
angsd -vcf-gl ../smallBam/small2.bcf -domajorminor 1 -domaf 1
(brew install samtools)
We will also index the BAM files in case we need to do random access, for this we will use SAMtools.
for i in bams/*.bam;do samtools index $i;done
Make a list of the locations of the bam files:
ls bams/*.bam > bam.filelist
To get the ancestral states of hg19
download http://popgen.dk/software/download/angsd/hg19ancNoChr.fa.gz mv hg19ancNoChr.fa.gz chimpHg19.fa.gz samtools faidx chimpHg19.fa.gz
Calculate allele frequencies (most basic command):
../angsd/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5
Lets remove those reads that has a mapping quality below 30, and only use the bases with a score above 19. And to simply output we only print those sites with an allele frequency above 0.05.
../angsd/angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minQ 20 -minMaf 0.05
../angsd/angsd -bam bam.filelist -doSaf 1 -anc chimpHg19.fa -GL 1 -P 24 -out out
Obtain the maximum likelihood estimate of the SFS using the realSFS program found in the misc subfolder.
../angsd/misc/realSFS out.saf.idx -P 24 > out.sfs
Plot the SFS in R:
s<-scan('out.sfs') s<-s[-c(1,length(s))] s<-s/sum(s) barplot(s,names=1:length(s),main='SFS')
../angsd/misc/realSFS saf2theta out.saf.idx -sfs out.sfs -outname out
#calculate Tajimas D
../angsd/misc/thetaStat do_stat out.thetas.idx
cat out.thetas.idx.pestPG
The info we are interested in (number of effective sites) is the 14th column of the pest file.