Post date: Nov 12, 2013 5:34:41 PM
Variant calling using all or only wild caught individuals identified similar numbers of variants.
all inds. = 323,161 SNVs
wild only = 318,256 SNVs
Thus, I can happily proceed with the variable sites identified from all individuals.
I will filter the SNVs from varMelissaAll.vcf such that I retain variants that meet the following criteria,
a minimum of 20 sequences with the non-reference allele (this means that the one of the last two values given by DP4 must be at least 20)
no more than 1 reversed orientation read of the reference or non reference allele (this means that DP4 pos1 or pos2 and pos3 or pos4 must be less than or equal to 1
a minimum total sequence coverage (DP) of 1500 (approximately 1 x)
no more than one non-reference allele
I wrote a perl called vcfFilter.pl to accomplish this task. I retained 206,047 SNVs based on these criteria. These are in the file filtered_varMelissaAll.vcf. I moved this and the other vcf and bcf files to a variant directory in the lycaeides_hostplant projects directory.