Post date: Jun 25, 2018 10:22:33 PM
The SV data from Kay are in /uufs/chpc.utah.edu/common/home/u6000989/projects/timema_SV/SVs/. He has SVs (bcf format) from two programs. These need cleaned up some.
Here are Kay's notes:
- I played first around removing overlapping reads using PEAR - this works with DELLY but not LUMPY. I tried Pindel and SVtools but neither of them worked. I looked a bit around but could not find any novel amazing software that came out.
So what Zach got was the Lumpy/Delly calls using the non-filtered reads (i.e. including overlapping) - these should however not affect the calls unless for small indels, which need to be filtered out. At the time I used IntanSV from Bioconductor https://bioconductor.org/packages/release/bioc/html/intansv.html to merge the calls - the problem is that now all software packages updated everything and I could not get it work. I suppose Zach has anyway a better way?
Note that I ran Delly/Lumpy using GQ threshold of >28 which should yield conservative estimates.
The goal of the project is to ask whether SVs have higher Fst than other loci. To this end, we will also include the mel-stripe inversion. The population genomic data will come from the eight populations we considered in the parallel speciation Science paper (whole genome data).