Post date: Oct 20, 2017 8:47:25 PM
The de novo scaffolds/contigs are in here
/uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/denovo/final.scaffolds.fa.gz
As a first pass, I am not using these, but instead am trying to align the raw reads from the green morph to the brown genome scaffolds 128 and 702.1. I am working here, /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/.
I am starting with bwa aln, as I can explicitly set the missmatch number. Which I am varying from 2 to 8.
I ran this: perl wrap_qsub_slurm_bwa.pl
Which has main part:
## build array of jobs to be run individually (serially) by slurm
my $dir = '/uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/';
my @jobarray;
my $aln1;
my $aln2;
my $sampe;
my $ind;
my $genome = "/uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta";
my $cd = "cd $dir\n";
my $fq1;
my $fq2;
foreach my $m (2..8){
$fq1 = "CP-2842_S14_L007_R1_001.fastq.gz";
$aln1 = "bwa aln -n $m -l 20 -k 3 -t 12 -q 10 -f CP_R1_$m.sai $genome $fq1\n";
$fq2 = "CP-2842_S14_L007_R2_001.fastq.gz";
$aln2 = "bwa aln -n $m -l 20 -k 3 -t 12 -q 10 -f CP_R2_$m.sai $genome $fq2\n";
$sampe ="bwa sampe -n 1 -N 1 -r \'\@RG\\tID:tcrGreen\\tPL:ILLUMINA\\tLB:tcr-green\\tSM:tcr-green\' -a 500 -f alnCP_$m.sam $genome CP_R1_$m.sai CP_R2_$m.sai $fq1 $fq2\n";
push (@jobarray, "$cd"."$aln1"."$aln2"."$sampe");
$fq1 = "HI.3849.008.Index_4.CP-2842_R1.fastq.gz";
$aln1 = "bwa aln -n $m -l 20 -k 3 -t 12 -q 10 -f HI_R1_$m.sai $genome $fq1\n";
$fq2 = "HI.3849.008.Index_4.CP-2842_R2.fastq.gz";
$aln2 = "bwa aln -n $m -l 20 -k 3 -t 12 -q 10 -f HI_R2_$m.sai $genome $fq2\n";
$sampe ="bwa sampe -n 1 -N 1 -r \'\@RG\\tID:tcrGreen\\tPL:ILLUMINA\\tLB:tcr-green\\tSM:tcr-green\' -a 500 -f alnHI_$m.sam $genome HI_R1_$m.sai HI_R2_$m.sai $fq1 $fq2\n";
push (@jobarray, "$cd"."$aln1"."$aln2"."$sampe");
}
And generates, e.g.,
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/
bwa aln -n 8 -l 20 -k 3 -t 12 -q 10 -f CP_R1_8.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta CP-2842_S14_L007_R1_001.fastq.gz
bwa aln -n 8 -l 20 -k 3 -t 12 -q 10 -f CP_R2_8.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta CP-2842_S14_L007_R2_001.fastq.gz
bwa sampe -n 1 -N 1 -r '@RG\tID:tcrGreen\tPL:ILLUMINA\tLB:tcr-green\tSM:tcr-green' -a 500 -f alnCP_8.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta CP_R1_8.sai CP_R2_8.sai CP-2842_S14_L007_R1_001.fastq.gz CP-2842_S14_L007_R2_001.fastq.gz
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/
bwa aln -n 8 -l 20 -k 3 -t 12 -q 10 -f HI_R1_8.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta HI.3849.008.Index_4.CP-2842_R1.fastq.gz
bwa aln -n 8 -l 20 -k 3 -t 12 -q 10 -f HI_R2_8.sai /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta HI.3849.008.Index_4.CP-2842_R2.fastq.gz
bwa sampe -n 1 -N 1 -r '@RG\tID:tcrGreen\tPL:ILLUMINA\tLB:tcr-green\tSM:tcr-green' -a 500 -f alnHI_8.sam /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/timema_green_denovo/brown_filtering/tcr_brown_128and702.fasta HI_R1_8.sai HI_R2_8.sai HI.3849.008.Index_4.CP-2842_R1.fastq.gz HI.3849.008.Index_4.CP-2842_R2.fastq.gz
Note I don't know what CP vs. HI is, I am just dealing with both.