defining hapltype loci

Post date: Aug 27, 2015 3:26:48 PM

We defined haplotype loci as follows:

1. start from the left-most read start point on a scaffold, this is the start-point for the first locus
2. proceed one (sorted, unique) read at a time, if the start if within $nbp = 100 of the defined start of the locus it is part of the same locus, if not the start point of the read defines the start of the next locus
3. continue until the end of scaffold, and then repeat for the next scaffold

The script for this is grabStarts.pl (/labs/evolution/data/aspen/gbs/Assemblies/Scripts) and it writes the ouftile hapLocusStarts.txt. This has two columns, one with the scaffold and one with the start, with one haplotype locus per row. Here is the command we used to run it on the aspen data:

perl Scripts/grabStarts.pl aln*sam

This generate 303,667 potential haplotype loci.

Page updated

Google Sites

Report abuse