Post date: Sep 18, 2019 6:30:42 PM
Because the deletion is a deletion in the stripe genome, I am repeating the alignment and variant calling with the melanic genome (/uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/mod_map_timema_06Jun2016_RvNkF702.fasta).
1. Back in the Parsed subdirectory (in /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_chumash_gbs/Parsed), I ran the alignment as,
perl wrap_qsub_slurm_bwa.pl 19_0*
Here is an example command.
cd /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_chumash_gbs/Parsed
bwa mem -t 8 -k 15 -r 1.3 -T 30 -r '@RG\tID:Timema-19_0884\tPL:ILLUMINA\tLB:Timema-19_0884\tSM:Timema-19_0884' /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/mod_map_timema_06Jun2016_RvNkF702.fasta 19_0884.fastq > aln19_0884.sam
p.s. in the end this was in the queue too long so I ran it on kp293 with fork instead, but it is otherwise as described above.
This uses bwa mem (version 0.7.17-r1188).
2. Compress, sort and index the alignments with samtools (version 1.5). This is in /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_chumash_gbs/Alignments_mel.
sbatch SubSam2Bam.sh
which runs Sam2BamFork.pl
#!/usr/bin/perl
#
# conver sam to bam, then sort and index
#
use Parallel::ForkManager;
my $max = 36;
my $pm = Parallel::ForkManager->new($max);
FILES:
foreach $sam (@ARGV){
$pm->start and next FILES; ## fork
$sam =~ m/^([A-Za-z0-9_\-]+\.)sam/ or die "failed to match $fq\n";
$base = $1;
system "samtools view -b -O BAM -o $base"."bam $sam\n";
system "samtools sort -O BAM -o $base"."sorted.bam $base"."bam\n";
system "samtools index -b $base"."sorted.bam\n";
$pm->finish;
}
$pm->wait_all_children;
3. Variant calling with samtools 1.5 and bcftools 1.6. This is in the Alignments_mel subdirectory. Run SamVarCallTchum.sh. This calls variants for samples in bamfiles. Note I dropped one individual where the same ID had been assigned to two barcodes.
#!/bin/sh
#SBATCH --time=240:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=sam_varcall
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
#SAMTOOLs mpileup version 1.5 options used:
#C = adjust mapping quality; recommended:50, disable:0 [0]
#d = max per-file depth; avoids excessive memory usage [250]
#f = faidx indexed reference sequence file
#q = skip alignments with mapQ smaller than INT [0]
#Q = skip bases with baseQ/BAQ smaller than INT [13]
#g = generate genotype likelihoods in BCF format
#t = --output-tags LIST optional tags to output:DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR []
#BCFTOOLs call version 1.6 options used
#v = output variant sites only
#c/m = he original calling method (conflicts with -m) or alternative model for multiallelic and rare-variant calling (conflicts with -c)
#p = variant if P(ref|D)<FLOAT with -c [0.5]
#P = --prior <float-o mutation rate (use bigger for greater sensitivity), use with -m [1.1e-3]
#O = output type: 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' (here it is 'v')
#o = write output to a file [standard output]
module load samtools
module load bcftools
cd /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_chumash_gbs/Alignments_mel
samtools mpileup -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version3/mod_map_timema_06Jun2016_RvNkF702.fasta -q 20 -Q 20 -g -I -t DP,AD -u -b bamfiles -o tchum_mel_gbs_2019.bcf
bcftools call -v -c -p 0.01 -P 0.001 -O v -o tchum_mel_gbs_2019.vcf tchum_mel_gbs_2019.bcf
The resulting variant files are in the Variants_mel subdirectory.