These instructions cover reference-based alignment and variant calling for GBS data. They assume that you have generated a reference sequence or sequence set ('refseq.fasta' below) and have one fastq file per individual that includes that individual's id in its name. The needed commands are included individually below with some comments, but you will probably want to include this in a script for batch submission to the DoRC cluster. You will need to load managed packages for bwa and samtools. This means that your submission script should include reuse BWA or reuse SAMtools.
1. Use bwa to index the reference sequence set with the 'is' algorithm. An example bash submission script is /labs/evolution/projects/example_scripts/jobsubVariantCall.sh.
bwa index -a is /PATH/refseq.fasta
2. Use bwa 'aln' and 'samse' algorithms to each individual's sequences to the indexed reference sequence set (this example is for IND1; IND1.sai and IND1.sam are created by the program). An example perl submission script is /labs/evolution/projects/example_scripts/wrap_qsub_rc_bwa.pl.
bwa aln -n 4 -l 20 -k 2 -t 8 -q 10 -f IND1.sai /PATH/refseq.fasta /PATH/IND1.fastq
bwa samse -n 1 -r '@RG ID:IND1' -f IND1.sam /PATH/refseq.fasta IND1.sai /PATH/IND1.fastq
3. Compress (convert .sam to .bam), sort, and index the alignments using samtools. An example perl submission script is /labs/evolution/projects/example_scripts/wrap_qsub_rc_sam2bam.pl.
samtools view -b -S -o IND1.bam IND1.sam
samtools sort IND1.bam IND1.sorted
samtools index IND1.sorted.bam
4. Variant calling with samtools and bcftools. Use the options below as a starting point, but adjust them as makes sense for your data.
samtools mpileup -C 50 -d 50 -f /PATH/refseq.fasta -q 10 -Q 15 -D -g -I *bam > variants.bcf
bcftools view -c -d 0.8 -g -p 0.01 -P full -t 0.001 -v variants.bcf > variants.vcf
/labs/evolution/projects/example_scripts/vcfFilter.pl variants.vcf