Post date: Oct 23, 2019 3:44:30 PM
We have 40 T. curi samples with 160 sets of paired reads. The data are in /uufs/chpc.utah.edu/common/home/gompert-group1/data/timema/moe_radiation_wgrs/sequences/. Information on host and actual IDs can be found in Tcuir_ids.txt one directory up. It looks like this:
WTCHG_132994_301 curi CR C 1 moe04A1
WTCHG_135250_301 curi CR C 1 moe04A1
WTCHG_134841_301 curi CR C 1 moe04A1
WTCHG_135251_301 curi CR C 1 moe04A1
WTCHG_135248_280 curi CR C 10 moe04B2
WTCHG_135249_280 curi CR C 10 moe04B2
WTCHG_131174_280 curi CR C 10 moe04B2
WTCHG_134840_280 curi CR C 10 moe04B2
WTCHG_135248_281 curi CR C 11 moe04C2
WTCHG_134840_281 curi CR C 11 moe04C2
1. I am using bwa mem (version 0.7.17-r1188) to a
From /uufs/chpc.utah.edu/common/home/gompert-group1/data/timema/moe_radiation_wgrs/scripts_2019_curi, I ran
perl wrap_qsub_slurm_bwa.pl
Here are a few example jobs:
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/timema/moe_radiation_wgrs/sequences/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:curi_CY_moe07H9-132995\tPL:ILLUMINA\tLB:curi_CY_moe07H9\tSM:curi_CY_moe07H9' /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/refs/tcuri_refgenome.fasta WTCHG_132995_249_1.fastq.gz WTCHG_132995_249_2.fastq.gz > /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/aln_132995_curi_CY_moe07H9.sam 2> /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/error_132995_curi_CY_moe07H9.log
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/timema/moe_radiation_wgrs/sequences/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:curi_CY_moe07H9-135252\tPL:ILLUMINA\tLB:curi_CY_moe07H9\tSM:curi_CY_moe07H9' /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/refs/tcuri_refgenome.fasta WTCHG_135252_249_1.fastq.gz WTCHG_135252_249_2.fastq.gz > /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/aln_135252_curi_CY_moe07H9.sam 2> /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/error_135252_curi_CY_moe07H9.log
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/timema/moe_radiation_wgrs/sequences/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:curi_CY_moe07H9-135253\tPL:ILLUMINA\tLB:curi_CY_moe07H9\tSM:curi_CY_moe07H9' /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/refs/tcuri_refgenome.fasta WTCHG_135253_249_1.fastq.gz WTCHG_135253_249_2.fastq.gz > /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/aln_135253_curi_CY_moe07H9.sam 2> /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/error_135253_curi_CY_moe07H9.log
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/timema/moe_radiation_wgrs/sequences/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:curi_CY_moe07H9-134842\tPL:ILLUMINA\tLB:curi_CY_moe07H9\tSM:curi_CY_moe07H9' /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/refs/tcuri_refgenome.fasta WTCHG_134842_249_1.fastq.gz WTCHG_134842_249_2.fastq.gz > /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/aln_134842_curi_CY_moe07H9.sam 2> /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/error_134842_curi_CY_moe07H9.log
Note that the output will be in /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/.
2. Sort, compress and index the alignments with samtools 1.5.
From within /uufs/chpc.utah.edu/common/home/gompert-group2/data/timema_sp_wgs/alignments_curi/, I ran,
sbatch SubSam2Bam.sh
which runs Sam2BamFork.pl on the 160 alignments, and yields *sorted.bam files:
#!/usr/bin/perl
#
# conver sam to bam, then sort and index
#
use Parallel::ForkManager;
my $max = 24;
my $pm = Parallel::ForkManager->new($max);
FILES:
foreach $sam (@ARGV){
$pm->start and next FILES; ## fork
$sam =~ m/^([A-Za-z0-9_\-]+\.)sam/ or die "failed to match $sam\n";
$base = $1;
system "samtools view -b -O BAM -o $base"."bam $sam\n";
system "samtools sort -O BAM -o $base"."sorted.bam $base"."bam\n";
system "samtools index -b $base"."sorted.bam\n";
$pm->finish;
}
$pm->wait_all_children;
3. Again using samtools, remove PCR duplicates,
sbatch SubRmdup.sh
which runs RmdupFork.pl on the 160 bam files,
#!/usr/bin/perl
#
# removes pcr duplicates
#
use Parallel::ForkManager;
my $max = 24;
my $pm = Parallel::ForkManager->new($max);
FILES:
foreach $bam (@ARGV){
$pm->start and next FILES; ## fork
$ubam = $bam;
$ubam =~ s/sorted/uniqe/ or die "failed sub $ubam\n";
system "samtools rmdup -S $bam $ubam\n";
system "samtools index -b $ubam\n";
$pm->finish;
}
$pm->wait_all_children;
4. Merge alignment files.
sbatch SubMerge.sh
which runs MergeFork.pl to generate 40 merged bams,
#!/usr/bin/perl
#
# merge bam files for each individual
#
use Parallel::ForkManager;
my $max = 30;
my $pm = Parallel::ForkManager->new($max);
open(IN, "ids.txt") or die "failed to read\n";
while(<IN>){
$pm->start and next; ## fork
chomp;
$id = $_;
$out = "merge_$id".".bam";
system "samtools merge -nr $out *$id*bam\n";
system "samtools sort -O BAM -o sorted_$out $out\n";
system "samtools index sorted_$out\n";
$pm->finish;
}
$pm->wait_all_children;