Post date: Jun 23, 2014 8:5:46 PM
I used samtools to remove PCR duplicates before variant calling. This is probably a good idea for paired-end reads from whole genome sequence data but is not possible for GBS data (it looks for pairs of reads with identical stop and start positions). Here is an example of the command:
cd /home/A01963476/data/timema/timema_wgrs/assembliesExperiment/
samtools rmdup timemaTC_5C_25088.sorted.bam timemaTC_5C_25088_unique.bam
samtools index timemaTC_5C_25088_unique.bam
The number of duplicates removed for each bam alignment are in duplicateProps.txt (mean = 4.3%, range = 1.8 - 29.2%).