Post date: Aug 31, 2016 10:1:29 PM
Whole genome sequence data from the sperm donor and additional non T. cristinae Timema are in the three P150774 sub-directories within /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus. These samples were sequenced three times (some runs could be lower quality, so reads and bases will need to be filtered during alignment). File names are connected to sample ids (along with read counts) in the fileNames.txt file. The sample ids are connected to actual sample ids (population, etc.) in the ids.txt file.
I am aligning all of these data to the new Dovetail T. cristinae genome. From there the project will be split between the sperm data (the old data need aligned to the T. cristinae Dovetail genome too) and the non cristinae genome data for thinking about diversity.
Alignments are using the bwa mem algorithm in bwa 0.7.10-r789. I am using /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Scripts/wrap_qsub_slurm_bwa.pl to run the alignments. It is being executed once from each tcrP150774* sub-directory, like this:
perl ../Scripts/wrap_qsub_slurm_bwa.pl WTCHG_251181_*1.fastq.gz
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/tcrP150774/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:podu_BS_C_458-yNumSZqE\tPL:ILLUMINA\tLB:podu_BS_C_458\tSM:podu_BS_C_458' /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta WTCHG_251181_303_1.fastq.gz WTCHG_251181_303_2.fastq.gz > /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/aln_1_podu_BS_C_458.sam 2> /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/error_1_podu_BS_C_458.log
The above example was for the first run. It will over-write the sperm samples (a mistake), but work fine for the rest. I fixed this by adding a random string to the file name (I did this before running runs 2 and 3). I am still letting run 1 go, as it is fine for almost all samples. I will need to re-run the sperm samples. Here are examples for runs 2 and 3.
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/tcrP150774-run2/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:sperm-nfyTarzW\tPL:ILLUMINA\tLB:sperm\tSM:sperm' /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta WTCHG_258652_296_1.fastq.gz WTCHG_258652_296_2.fastq.gz > /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/aln_2_sperm-nfyTarzW.sam 2> /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/error_2_sperm.log
cd /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/tcrP150774-run3/
bwa mem -t 8 -k 20 -w 100 -r 1.3 -T 30 -R '@RG\tID:sperm-sWEYBIXZ\tPL:ILLUMINA\tLB:sperm\tSM:sperm' /uufs/chpc.utah.edu/common/home/u6000989/data/timema/tcrDovetail/version2/timema_06Jun2016_RvNkF.fasta WTCHG_256078_292_1.fastq.gz WTCHG_256078_292_2.fastq.gz > /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/aln_3_sperm-sWEYBIXZ.sam 2> /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/error_3_sperm.log
All of the *sam files will be in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/
The original sperm sequence data from /uufs/chpc.utah.edu/common/home/u6000989/data/timema/sperm/15*XX/* were also aligned to the new Dovetail reference and are also now in /uufs/chpc.utah.edu/common/home/u6000989/data/timema/SpermDonorPlus/Alignments/. The sperm alignment files all have *tcrSpr* as part of the ids, followed by the ID numbers. It is important to remember that not all of these are individual sperm.