Teton time series 2013-2017, variant filtering

Post date: Jun 07, 2018 4:41:31 PM

I used my own perl scripts to filter the variant set from GATK. This was all done in the Variants_2018 sub-directory.

1. First, I ran my standard vcfFilter.pl filter script,

perl ../Scripts/vcfFilter.pl lycTimeSeriesVariants.vcf

This made filtered2x_lycTimeSeriesVariants.vcf, which contains 36,225 SNPs. Here is the head of the script with relevant parameters:

#### stringency variables, edits as desired

my $minCoverage = 3072; # minimum number of sequences; DP

my $minAltRds = 20; # minimum number of sequences with the alternative allele; AC

my $notFixed = 1.0; # removes loci fixed for alt; AF

my $bqrs = 3; # maximum absolute value of the base quality rank sum test; BaseQRankSum

my $mqrs = 2.5; # maximum absolute value of the mapping quality rank sum test; MQRankSum

my $rprs = 2; # maximum absolute value of the read position rank sum test; ReadPosRankSum

my $qd = 2; # minimum ratio of variant confidenct to non reference read depth; QD

my $mq = 30; # minimum mapping quality; MQ

my $miss = 307; # maximum number of individuals with no data

##### this set is for whole genome shotgun data

The file filter_log has information on what was filtered why (missing is a big one, maybe because two runs were combined).

2. I then got depth information for these SNPs and ran filterSomeMore.pl to get rid of SNPs within < 3 bp of each other and with more than the mean + 2 sd coverage.

cat filtered2x_lycTimeSeriesVariants.vcf | perl -p -i -e 's/^.*PL\s+//' | perl -p -i -e 's/.\/.:\d+,\d+:(\d+)\S+/\1/g' | perl -p -i -e 's/\S+:(\d+\s+)/$1/g' > depth.txt

grep -v ^# depth.txt > temp

mv temp depth.txt

## mean cov. is ~11x

grep ^Sc filtered2x_lycTimeSeriesVariants.vcf | cut -f 1,2 | perl -p -i -e 's/Scaffold_(\d+);HRSCAF_\d+\s+/\1 /' > snps.txt

perl ../Scripts/filterSomeMore.pl filtered2x_lycTimeSeriesVariants.vcf > filter2_log

This generates the final variant set with 30,513 SNPs, morefilter_filtered2x_lycTimeSeriesVariants.vcf.

Page updated

Google Sites

Report abuse