Post date: Mar 25, 2016 2:54:48 AM
Variant calling with GATK's HaplotypeCaller using The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/alfalfa/gbs/fallon/Alignments/
java -Xmx920g -jar ~/bin/GenomeAnalysisTK.jar -T HaplotypeCaller -R ../../../genome/Msativa/DATA/RUN/ASSEMBLIES/assem1feb15/final.assembly.fasta -I fallonbams.list -o fallon.vcf -gt_mode DISCOVERY -hets 0.001 -mbq 20 -out_mode EMIT_VARIANTS_ONLY -ploidy 4 -stand_call_conf 50 -pcrModel AGGRESSIVE
Variant files were linked to a project directory, here: /uufs/chpc.utah.edu/common/home/u6000989/projects/alfalfa_fallon/Variants
I tried an initial round of variant filtering with vcfutils.pl varFilter, but I want to run my own filter too:
~/bin/vcfutils.pl varFilter -d 192 -a 12 -w 5 -1 0 -2 0 -3 0 -4 0 -e 1e-4 fallon.vcf > vfilt_fallon.vcf
This reduced the number of variants from 479,486 to 223,887.
I then ran my own variant filter, which removes mult-allelic and INDEL loci, and has the following cutoffs: perl ../Scripts/vcfFilter.pl vfilt_fallon.vcf
#### stringency variables, edits as desired
my $minCoverage = 192; # minimum number of sequences; DP
my $minAltRds = 12; # minimum number of sequences with the alternative allele; AC
my $notFixed = 1.0; # removes loci fixed for alt; AF
my $bqrs = 3; # maximum absolute value of the base quality rank sum test; BaseQRankSum
my $mqrs = 2; # maximum absolute value of the mapping quality rank sum test; MQRankSum
my $rprs = 2; # maximum absolute value of the read position rank sum test; ReadPosRankSum
my $qd = 2; # minimum ratio of variant confidenct to non reference read depth; QD
my $mq = 30; # minimum mapping quality; MQ
my $miss = 8; # maximum number of individuals with no data
##### this set is for whole genome shotgun data
44,022 SNPs were retained, and are in the file filtered4x_vfilt_fallon.vcf.
I converted these to genotype likelihood format (no MAF filter) and removed a few screwy genotypes with:
perl ../Scripts/vcf2gl.pl 0.0 filtered4x_vfilt_fallon.vcf
perl checkVariants.pl filtered4x_alfalfavariants.gl
The final file mod_filtered4x_alfalfavariants.gl has 43,920 SNPs.