plans for estimating haplotypes

Post date: Aug 24, 2015 4:30:42 PM

We first need to define loci for calling haplotypes. It looks like this can mostly be done by treating a locus as all reads that start within 100 bp of each other.

So, here is the initial plan:

1. grab all of the unique start points (which will be a mix of starts and stops)
2. find the outer bounds of each set where all unique starts are within 100 bps of each other
3. the outer bounds will delineate the loci for haplotype calling
4. we will extract variable sites from each read along with quality scores for all SNPs within a haplotype, filling in with Ns any cases where no data exist for a read (give these bad quality scores = all bases equally likely, or 100 % chance of error)
5. this will be the input for our model

Page updated

Google Sites

Report abuse