Post date: Oct 11, 2016 3:8:53 AM
I repeated alignment and variant calling to quantify genomic change on LG8 using version 3 of the Dovetail genome (with scaffold 702 split). This was done with the same parameters as described here. The resulting vcf files are in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/variants_*/.
Next, I created a new set of sub-directories for population genomic analyses related to the dark morph/fluctuating selection project. The vcf files can be founded within this at, /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_fluct/genomic_change_dark_morph/variants (via a symbolic link).
Two rounds of filtering were used. First, I ran the vcfFilter.pl script in the scripts subdirectory one level up from the variants files. The key adjustable options were:
# FHA, N = 1102; OGA, N = 451
my $N = 1102;
#my $N = 451;
#### stringency variables, edits as desired
my $minCoverage = $N * 2; # minimum number of sequences; DP = 2X
my $minAltRds = 10; # minimum number of sequences with the alternative allele; AC
my $notFixed = 1.0; # removes loci fixed for alt; AF
my $mq = 30; # minimum mapping quality; MQ
my $miss = 0.2 * $N; # maximum number of individuals with no data
my $minMaf = $N * 2 * 0.005; # minimum MAF, as allele count ~ 0.005
my $ratDifDir = 0.01; ## maximum proportion of reads in reverse (relative to avg.) orientation
Variants were also dropped for having multiple alleles. This left 328,334 and 555,524 variants for FHA and OGA, respectively. Next, I ran an additional filter (filterSomeMore.pl) to get rid of very high coverage or neighbouring SNPs. Depth data were first extracted from the vcf files with these one-liners:
grep ^Sc filtered2x_tcrFhaVars.vcf | cut -f 8 | perl -p -i -e 's/^DP=(\d+)\S+/\1/' > filt2xfha_depth.txt
grep ^Sc filtered2x_tcrOgaVars.vcf | cut -f 8 | perl -p -i -e 's/^DP=(\d+)\S+/\1/' > filt2xoga_depth.txt
I then ran filterSomeMore.pl to retain SNPs with less than mean + 3 s.d. coverage and those at least 5 bp apart.
perl ../scripts/filterSomeMore.pl filtered2x_tcrFhaVars.vcf
perl ../scripts/filterSomeMore.pl filtered2x_tcrOgaVars.vcf
From this, I retained 215,823 (FHA) and 323,306 (OGA) SNPs for downstream analyses. The relevant vcf files begin with morefilter*.