Post date: May 10, 2017 1:45:16 PM
9v17. Notes from ZG:
I created a projects directory for the wing mapping project. It is here:
/uufs/chpc.utah.edu/common/home/u6000989/projects/lycaeides_wings
You should have write permissions. Right now it has two sub-directories:
Scripts and Variants. The variant (SNP) data is in Variants and some
scripts are in Scripts. I ran two scripts that I commonly use for
filtering.
The first one was:
perl ../Scripts/vcfFilter.pl varsLycGwa.vcf
This applies the following filters (SNPs must meet the following
conditions to be kept): a mean coverage of 2X (per ind.), at least 10
reads supporting the non-reference allele, not fixed for the alternative
allele, a minimum mapping quality of 30, data for at least 80% of
individuals, a minimum minor allele frequency of at least 0.005, no more
than 1% of reads in the reverse orientation, a bi-allelic SNP.
This generated the file filtered2x_varsLycGwa.vcf with 81,211 SNPs. The
version with the 70% cut-off has 99,874 SNPs and is called
filtered2x-70_varsLycGwa.vcf.
I then applied a second filtering script that gets rid of loci with too
high of coverage (3 s.d. above the mean), which are potential repeat
regions, and drops SNPs that are too close together (within 3 bp of each
other), as a high density of SNPs could indicate a bad alignment.
perl ../Scripts/filterSomeMore.pl filtered2x_varsLycGwa.vcf
This produced the final filtered data set with 63,951 SNPs called
morefilter_filtered2x_varsLycGwa.vcf
Repeating this with the 70% data cut-off generated 78,567 SNPs in
morefilter_filtered2x-70_varsLycGwa.vcf.