Let's start to work first. While program running, you can read more here.

In this tutorial, I will analyze mouse paired-end RNA-Seq data from Illumna GAII to calculate expression level of known UCSC genes and alternative splicing variants, and identify novel gene and novel alternative splicing variants. If you you need to analyze RNA-Seq data from other species, please let me know. I will prepare the genome reference, repeat mask file and gene annotation file according to your request.

Note: You can copy and paste all the text in blue to your Linux command line to run. Anything with "#" is comment, and will be IGNORED by Linux.

1) ssh to smp node

# Because some scripts need large memory, also some of them can run multiple threads to speed up the analysis, I prefer to use the smp nodes.
# Please you should  not use the head nodes to run these analysis.
ssh oscar
ssh -Y smp007
# Note: option -Y will allow linux to forward graphic output from R, or Java

# Check if the node is busy. If yes, try other smp nodes. The available nodes are from "smp006" to "smp007"

2) run bowtie to map paired end reads onto mouse 

# Create a folder to work within. Notice that files in the scratch folder more than 4 weeks old will be automatically deleted.
scratch/erange-test &&  cd scratch/erange-test

# Here, I use lane 1 as example
export LANE=s_1

# Copy the raw illumina sequence file to the current folder
cp path_to_your_read_files/
${LANE}_*_sequence.txt .

# Here I use GERALD ouput file in _sequence.txt (fastq) format as my read files. If you have raw qseq.txt format from Bustard, the following command can do the conversion. For lane 1:
# read_files=/xyz/xyz/Bustard07-10-2010_xyz/s_1_1_*_qseq.txt 
# cat $read_files | perl /gpfs/runtime/bioinfo/bin/ > s_1_1_sequence.txt

# read_files=/xyz/xyz/Bustard07-10-2010_xyz/s_1_2_*_qseq.txt 
# cat $read_files | perl /gpfs/runtime/bioinfo/bin/ > s_1_2_sequence.txt

# Run FastQC to check sequence quality (this step is optional, see this link for detail on how to use FastQC:
module load fastqc

# Setup bowtie executable and index path for Tophat
export PATH=/gpfs/runtime/bioinfo/bowtie:/gpfs/runtime/bioinfo/samtools-0.1.8/misc:
export BOWTIE_INDEXES=/gpfs/runtime/bioinfo/bowtie/indexes/

# Setup email address to get a email notice when each step is done. You need use your own email address in stead of

# Check to make sure the email notice function works. After the command finishes for a few seconds, you should get an email notice telling you the command is finished.

# Run bowtie. This will need a few hours.
# Here I use 8 processors
# mm9_for_erange_bowtie60bp is the mouse genome reference for 60bp paried-end reads, which is kept in the bowtie index folder.
${LANE}_1_sequence.txt is the same as s_1_1_sequence.txt because I set the value of "LANE" to "s_1". This file is forward reads.
# ${LANE}_2_sequence.txt is the same as s_1_2_sequence.txt because I set the value of "LANE" to "s_1". This file is reverse
/gpfs/runtime/bioinfo/bowtie/bowtie mm9_for_erange_bowtie60bp -p 8  -v 2 -k 11 -m 10 -t --strata --best --un $unm --max $max  $read_file  > $bowtie_file && notice_me

3) Make RDS files from bowtie alignment result

# This needs a about half hour

module load erange
python $ERANGEPATH/  label $bowtie_file $rds_file -RNA $CISTEMATIC_ROOT/M_musculus/knownGene.txt  -rawreadID -verbose && notice_me

4) run paired-end RNA pipeline to calculate gene expression and identify novel transcripts

# This needs a long time
module load erange
bash $ERANGEPATH/ mmusculus $rds_file $CISTEMATIC_ROOT/M_musculus/mm9_repeat_db  && notice_me