Post date: Oct 19, 2014 9:57:42 PM
I used the perl script vcfFilter.pl to quality filter the variants identified by GATK. The main filters and parameters are below (from the script). The resulting filtered variants are in filtered_variantsLG*vcf (one file per LG) in /labs/evolution/projects/timema_wgexperiment/variants. We retained 8,170,146 variants.
#### stringency variables, edits as desired
my $minCoverage = 500; # minimum number of sequences; DP
my $minAltRds = 10; # minimum number of sequences with the alternative allele; AC
my $notFixed = 1.0; # removes loci fixed for alt; AF
my $dels = 0.01; # maximum prop. of reads spanning an indel; DEL
my $mq0 = 5; # maximum number of mapping quality 0 reads; MQ0
my $bqrs = 3; # maximum absolute value of the base quality rank sum test; BaseQRankSum
my $mqrs = 2; # maximum absolute value of the mapping quality rank sum test; MQRankSum
my $rprs = 2; # maximum absolute value of the read position rank sum test; ReadPosRankSum
my $qd = 2; # minimum ratio of variant confidenct to non reference read depth; QD
my $mq = 50; # minimum mapping quality; MQ
##### this set is for whole genome shotgun data
Now I am using subvcf2gl.pl to combine these variants in a single and write the results in gl format but only for those variants with MAF of at least 1%. The results are being written to timema500g.gl with the allele frequencies in af_timema500g.txt.