1). Alignment
I did alignments of the whole genomes to the new gap filled reference genome for L. melissa.
Folder:
2). Variant calling using Delly and Lumpy
3) Merge vcfs using SURVIVOR
anna = 6495
idas = 13664
melissa = 12802
sierra = 10975
warner = 11779
uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lyc_sv/Variantcalling/survivor/SURVIVOR/Debug/SURVIVOR merge merged_vcf_list 1000 2 1 1 0 30 all_merged_filtered.vcf
#After merging all I end up with 12265 variants
DEL = 11594
DUP = 224
INV = 13
Plotting the comparison of multiple input VCF files after merging.
After the merge, you can use the merged vcf file to obtain a graphical representation of the overlaps.
If you have many VCF files that you want to compare: The first step is to obtain a pairwise comparison matrix like this:
Extract the overlap information like this:
perl -ne 'print "$1\n" if /SUPP_VEC=([^,;]+)/' all_merged_filtered.vcf | sed -e 's/\(.\)/\1 /g' > all_merged_overlapp.txt
This will extract the string from the support vector representing 0 or 1 depending on if a sample/ input VCF file supports an SV or not. The sed command will add a space between every character, which is needed for R.
Next, I am using the R package called VennDiagram. You need to install/download it for your R instance.
t=read.table("Documents/lycaeides/all_merged_overlapp.txt",header=F)
dst <- data.matrix(t(t))
image(1:nrow(dst), 1:ncol(dst),log(dst), col=rev(heat.colors(10)),axes=F, xlab="", ylab="")
grid(col='black',nx=nrow(dst),ny=ncol(dst),lty=1)
library(VennDiagram)
venn.diagram(list(anna=which(t[,1]==1), idas=which(t[,2]==1), melissa=which(t[,3]==1), sierra=which(t[,4]==1), warner=which(t[,5]==1)) , fill = c("gray", "orange" ,"blue", "green", "red") , alpha = c(0.5, 0.5, 0.5, 0.5, 0.5), cex = 2, lty =2, filename = "my_sample_overlapp.tiff")