Post date: Sep 03, 2015 3:54:46 PM
All directory references are from /labs/evolution/data/aspen/gbs/Assemblies
SNPs were extracted from filtered2x_aspenvariants.vcf (in snps.txt), which is the variant file that we used for the NSF grant. We then assigned these to haplotype loci based on their positions with the script combineHapLocusSnps.pl (in Scripts/). This generated snpsPerHapLocus.txt which has the scaffold and start of each haplotype locus, followed by the position and allele frequency of each SNP.
We then grabbed the subset of haplotype loci with 2-4 SNPs using the script calcSnpsPerLocus.pl (in Scripts), which also prints the number of SNPs per locus. The set of haplotype loci with 2-4 SNPs is in sub_hapLocusWithSnps.txt. There are 8700 such haplotype loci. This is the set of SNPs/Haplotye loci that we will grab individual genetic data for. We did not yet pitch out SNPs or loci with low minor allele frequencies, instead we will later drop haplotype loci with fewer than 3 (haplotype) alleles.