Post date: Sep 24, 2019 2:42:34 AM
As part of our 1st objective on the Dimensions grant to explain patterns of occupancy by L. melissa in nature, I am going to estimate effective migration surfaces. This should allow us to identify parts of the physical landscape that receive less gene flow. The method is described here: https://www.nature.com/articles/ng.3464.
I have the program installed locally, in ~/source/eems/runeems_snps/. It is compiled and appears to be working.
I want to re-align the Lycaeides melissa data to the current Hi-C L. melissa genome. I have linked the relevant fastq files to /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesDimensions/. This includes all 26 populations from Sam's 2018 Mol. Ecol. paper, along with the BST samples (part of the Dubois paper). We have 27 populations and 541 butterflies total. Here are the sample sizes per population:
19 ABC
46 ABM
20 BHP
20 BSD
24 BST
23 CDY
10 CKV
20 DBQ
20 DCR
20 GLA
18 GVL
24 LAN
20 LCA
20 MON
19 MTU
19 OCY
20 REW
16 SCC
18 SLA
20 SUV
20 SV
13 TPT
20 UAL
20 VCP
20 VIC
12 WAL
20 YWP
1. Alignment with bwa mem, version 0.7.17-r1188. I ran,
sbatch SubBwa.sh
#!/bin/sh
#SBATCH --time=140:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=30
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=bwa
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
module load bwa
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesDimensions
perl BwaFork.pl *fastq
which runs
#!/usr/bin/perl
#
# bwa mem alignment
#
use Parallel::ForkManager;
my $max = 40;
my $pm = Parallel::ForkManager->new($max);
my $genome = "/uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/dovetail_melissa_genome/download/HiC_HiCRise_GLtS4/melissa_blue_21Nov2017_GLtS4/mod_melissa_blue_21Nov2017_GLtS4.fasta";
FILES:
foreach $file (@ARGV){
$pm->start and next FILES; ## fork
if ($file =~ m/^([A-Z0-9\-_]+)/){
$ind = $1;
}
else {
die "Failed to match $file\n";
}
system "bwa mem -t 1 -k 15 -r 1.3 -T 30 -r \'\@RG\\tID:Lmel-"."$ind\\tPL:ILLUMINA\\tLB:Lmel-"."$ind\\tSM:Lmel-"."$ind"."\' $genome $file > aln"."$ind".".sam\n";
$pm->finish;
}
$pm->wait_all_children;
2. Compressed, sorted and indexed the alignment files with samtools 1.5.
sbatch SubSam2Bam.sh
which runs,
#!/usr/bin/perl
#
# conver sam to bam, then sort and index
#
use Parallel::ForkManager;
my $max = 36;
my $pm = Parallel::ForkManager->new($max);
FILES:
foreach $sam (@ARGV){
$pm->start and next FILES; ## fork
$sam =~ m/^([A-Za-z0-9_\-]+\.)sam/ or die "failed to match $fq\n";
$base = $1;
system "samtools view -b -O BAM -o $base"."bam $sam\n";
system "samtools sort -O BAM -o $base"."sorted.bam $base"."bam\n";
system "samtools index -b $base"."sorted.bam\n";
$pm->finish;
}
$pm->wait_all_children;
3. Variant calling. Uses the original variant caller from bcftools, running samtools version 1.5 and bcftools version 1.6.
sbatch SamVarCallLyc.sh
which runs
#!/bin/sh
#SBATCH --time=240:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=24
#SBATCH --account=gompert-kp
#SBATCH --partition=gompert-kp
#SBATCH --job-name=sam_varcall
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=zach.gompert@usu.edu
#SAMTOOLs mpileup version 1.5 options used:
#C = adjust mapping quality; recommended:50, disable:0 [0]
#d = max per-file depth; avoids excessive memory usage [250]
#f = faidx indexed reference sequence file
#q = skip alignments with mapQ smaller than INT [0]
#Q = skip bases with baseQ/BAQ smaller than INT [13]
#g = generate genotype likelihoods in BCF format
#t = --output-tags LIST optional tags to output:DP,AD,ADF,ADR,SP,INFO/AD,INFO/ADF,INFO/ADR []
#BCFTOOLs call version 1.6 options used
#v = output variant sites only
#c/m = he original calling method (conflicts with -m) or alternative model for multiallelic and rare-variant calling (conflicts with -c)
#p = variant if P(ref|D)<FLOAT with -c [0.5]
#P = --prior <float-o mutation rate (use bigger for greater sensitivity), use with -m [1.1e-3]
#O = output type: 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' (here it is 'v')
#o = write output to a file [standard output]
module load samtools
module load bcftools
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_gbs/AssembliesDimensions
samtools mpileup -C 50 -d 250 -f /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/dovetail_melissa_genome/download/HiC_HiCRise_GLtS4/melissa_blue_21Nov2017_GLtS4/mod_melissa_blue_21Nov2017_GLtS4.fasta -q 20 -Q 30 -g -I -t DP,AD -u -b bamfiles -o lmel_dim_gbs_2019.bcf
bcftools call -v -c -p 0.01 -P 0.001 -O v -o lmel_dim_gbs_2019.vcf lmel_dim_gbs_2019.bcf