Alignments for RNASeq sequence reads are done using TopHat. I have just submitted a test run to see if this alignment works. To run tophat, we need bowtie indexes of the reference genome. These indexes should be created from the folder in which the fasta/fa genome file is present. These indexes are created using bowtie2-build command.
To do this I went to the directory /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/lycaeides_dubois/Alignments/fastqfiles/ where the genome file is present. I created a temporary folder called rna_ref to create bowtie indexes and then I copied the files created here to the matt_transcriptome/reference folder. I copied the melissa_blue_21Nov2017_GLtS4.fasta to the rna_ref folder as melissa_ref.fa. Then I went into the rna_ref folder and called the following commands:
ml bowtie2
bowtie2-build melissa_ref.fa melissa
*Note we need to provide a file name such as melissa for the files to be created.
I then copied the index files created here to /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/matt_transcriptome/reference/
Then I submitted a test run for tophat and the working directory for tophat is: /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/matt_transcriptome/tophat/
In this directory:
tophat-2.1.1.Linux_x86_64: Folder containing the tophat program executable
outdir: Folder for output of tophat runs
runtophat.sh: Bash script for trial run.
Here are the contents of the script:
#!/bin/bash
#SBATCH --job-name=fastx
#SBATCH --time=96:00:00 #walltime
#SBATCH --nodes=1 #number of cluster nodes
#SBATCH --account=usubio-kp #PI account
#SBATCH --partition=usubio-kp #specify computer cluster, other option is kinspeak
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/matt_transcriptome/tophat/
./tophat-2.1.1.Linux_x86_64/tophat -o outdir/ ../reference/melissa ../fastX/KS001_S71_L008_R1_001.fastq.final.fa ../fastX/KS001_S71_L008_R2_001.fastq.final.fa
**this is a trial run and is running on the cluster right now. Output files will be created in the outdir directory.
Submitted the final run with all samples.
Here is the content of the script runtophat_mod.sh which is a run for all the samples for paired end reads. One of the samples (PMKS005_S113_L007_R1_001.fastq.final.fa) was giving me an error. So I dropped this sample to run the analysis and it is running now. Here are the contents of the bash script:
#!/bin/bash
#SBATCH --job-name=tophat
#SBATCH --time=120:00:00 #walltime
#SBATCH --nodes=1 #number of cluster nodes
#SBATCH --account=usubio-kp #PI account
#SBATCH --partition=usubio-kp #specify computer cluster, other option is kinspeak
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/lycaeides/matt_transcriptome/tophat/
module load bowtie2
module load bowtie
./tophat-2.1.1.Linux_x86_64/tophat -o outdir/ ../reference/melissa ../fastX/KS001_S71_L008_R1_001.fastq.final.fa,../fastX/KS002_S72_L008_R1_001.fastq.final.fa,../fastX/KS003_S73_L008_R1_001.fastq.final.fa,../fastX/KS004_S74_L008_R1_001.fastq.final.fa,../fastX/PMKS001_S109_L007_R1_001.fastq.final.fa,../fastX/PMKS003_S111_L007_R1_001.fastq.final.fa,../fastX/PMKS002_S110_L007_R1_001.fastq.final.fa,../fastX/PMKS004_S112_L007_R1_001.fastq.final.fa,../fastX/PMKS006_S114_L007_R1_001.fastq.final.fa,../fastX/PMKS008_S116_L007_R1_001.fastq.final.fa,../fastX/PMKS007_S115_L007_R1_001.fastq.final.fa ../fastX/KS001_S71_L008_R2_001.fastq.final.fa,../fastX/KS002_S72_L008_R2_001.fastq.final.fa,../fastX/KS003_S73_L008_R2_001.fastq.final.fa,../fastX/KS004_S74_L008_R2_001.fastq.final.fa,../fastX/PMKS001_S109_L007_R2_001.fastq.final.fa,../fastX/PMKS003_S111_L007_R2_001.fastq.final.fa,../fastX/PMKS002_S110_L007_R2_001.fastq.final.fa,../fastX/PMKS004_S112_L007_R2_001.fastq.final.fa,../fastX/PMKS006_S114_L007_R2_001.fastq.final.fa,../fastX/PMKS008_S116_L007_R2_001.fastq.final.fa,../fastX/PMKS007_S115_L007_R2_001.fastq.final.fa
Note in the script that we have to give the full path for each file for read 1 and read 2.