Post date: Sep 14, 2016 2:51:29 AM
I am using two sampling times for FHA and OGA to calculate allele frequency change based on the GBS data for LG8 relative to the rest of the genome. I have made symbolic links to the relevant files, and am now working here (with sub-directories for OGA and FHA):
/uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments
1. I am using bwa (0.7.10-r789) aln and samse to align the data to the new Dovetail genome (version 2).
Example command for OGA:
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/fq_oga/
bwa aln -n 5 -l 20 -k 2 -t 8 -q 10 -f alnND1527_MR4.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta ND1527_MR4_A_gbs.fq.gz
bwa samse -n 1 -r '@RG\tID:tcr-ND1527_MR4\tPL:ILLUMINA\tLB:tcr-ND1527_MR4\tSM:tcr-ND1527_MR4' -f alnND1527_MR4.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta alnND1527_MR4.sai ND1527_MR4_A_gbs.fq.gz
Example command for FHA:
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/fq_fha/
bwa aln -n 5 -l 20 -k 2 -t 8 -q 10 -f alnCA25093-5C.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta CA25093-5C.fastq
bwa samse -n 1 -r '@RG\tID:tcr-CA25093-5C\tPL:ILLUMINA\tLB:tcr-CA25093-5C\tSM:tcr-CA25093-5C' -f alnCA25093-5C.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta alnCA25093-5C.sai CA25093-5C.fastq
2. Compress, sort and index alignments, which are now in the alignment_*, sub-directories with samtools 1.2. Here are example commands for each set of files (population):
OGA:
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/alignments_oga/
samtools view -b -S -o alnND1527_MR4.bam alnND1527_MR4.sam
samtools sort alnND1527_MR4.bam alnND1527_MR4.sorted
samtools index alnND1527_MR4.sorted.bam
FHA:
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/alignments_fha/
samtools view -b -S -o alnCA25093-5C.bam alnCA25093-5C.sam
samtools sort alnCA25093-5C.bam alnCA25093-5C.sorted
samtools index alnCA25093-5C.sorted.bam
3. Variant calling samtools 1.2 (mpileup) and bcftools (call). Note, I decided to use these for the GBS data even though I am using the HaplotypeCaller for the genome data. I simply don't think that HC will improve results for GBS data, though I should test this sometime. Also, I am using the new multiallelic/rare variant caller. Results will be in variants_* subdirectories. Here are the commands:
OGA:
#!/bin/sh
#SBATCH --time=120:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=samtools
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load samtools
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/alignments_oga/
samtools mpileup -b bams -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta -q 20 -Q 30 -I -u -g -t DP,DPR -o tcrOgaVars.bcf
bcftools call -v -m -p 0.01 -P 0.001 -O v -o tcrOgaVars.vcf tcrOgaVars.bcf
FHA:
#!/bin/sh
#SBATCH --time=240:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=samtools
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
echo ------------------------------------------------------
echo SLURM: job identifier is $SLURM_JOBID
echo SLURM: job name is $SLURM_JOB_NAME
echo ------------------------------------------------------
module load samtools
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/darkmorph_gbs_alignments/alignments_fha/
samtools mpileup -b bams -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta -q 20 -Q 30 -I -u -g -t DP,DPR -o tcrFhaVars.bcf
bcftools call -v -m -p 0.01 -P 0.001 -O v -o tcrFhaVars.vcf tcrFhaVars.bcf