10/01/2019
I have started the transcriptome assembly.
Here is the folder where I am working on this assembly: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome
1. FastQC
Ran FastQC to check quality of sequences (folder:/uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/fastqc/) . I did this for both our data and the downloaded transcriptomic data. I created an output directory for both these data so that the files are stored separately.
Directories:
Our data: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/fastqc/
I unzipped the data first: for f in *.gz; do gunzip $f; done
Here is the bash script I used:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J fastqc
module load fastqc
for f in /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/RNAseq20151015/*.fastq; do fastqc –o "./" $f; done
Fastqc outputs two sets of files for each sample: 1). html file 2). Zip file with all the statistics
I used the following bash code to compile the summaries and statistics from the zip folders for each sample. The output summaries are saved in the folder fq_aggregate.
# Run this script in a directory containing zip files from fastqc. It aggregates images of each type in individual folders
# So looking across data is quick. Output will be saved in the folder: fq_aggregate
zips=`ls *.zip`
for i in $zips; do
unzip -o $i &>/dev/null;
done
fastq_folders=${zips/.zip/}
rm -rf fq_aggregated # Remove aggregate folder if present
mkdir fq_aggregated
# Rename Files within each using folder name.
for folder in $fastq_folders; do
folder=${folder%.*}
img_files=`ls ${folder}/Images/*png`;
for img in $img_files; do
img_name=$(basename "$img");
img_name=${img_name%.*}
new_name=${folder};
mkdir -p fq_aggregated/${img_name};
mv $img fq_aggregated/${img_name}/${folder/_fastqc/}.png;
done;
done;
# Concatenate Summaries
for folder in $fastq_folders; do
folder=${folder%.*}
cat ${folder}/summary.txt >> fq_aggregated/summary.txt
done;
# Concatenate Statistics
for folder in $fastq_folders; do
folder=${folder%.*}
head -n 10 ${folder}/fastqc_data.txt | tail -n 7 | awk -v f=${folder/_fastqc/} '{ print $0 "\t" f }' >> fq_aggregated/statistics.txt
rm -rf ${folder}
done
The results show some summary stats as failed. I think running rcorrector will clean this data up and I will run fastqc again after running rcorrector.
2.rcorrector
Since rcorrector is not present as a module on the UofU CHPC, I installed it locally in my directory: /uufs/chpc.utah.edu/common/home/u6007910/projects/sam
Install rcorrector:
module load git
git clone https://github.com/mourisl/rcorrector.git #downloads rcorrector repository
cd rcorrector/
ls
make #installs and downloads jellyfish2 if it is not available in the rcorrector path
perl run_rcorrector.pl #test for installation
Running rcorrector:
mkdir rcorrector
bash script for rcorrector:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 50:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J rcorrect
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load perl
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/rcorrector/
for prefix in $(ls /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/RNAseq20151015/*.fastq | sed -r 's/_R[12]_001[.]fastq//' | uniq)
do
perl /uufs/chpc.utah.edu/common/home/u6007910/projects/sam/rcorrector/run_rcorrector.pl -t 12 -1 "${prefix}_R1_001.fastq" -2 "${prefix}_R2_001.fastq"
done
After rcorrector has run, we will get outfiles with extensions "fq". Use the following steps to run some statistics on these files and get numbers for how many files are tagged with "cor" or "unfixable". I saved these numbers in this file: https://docs.google.com/spreadsheets/d/1vU6Fj-MqAYnps0ksBL_gx5hS5eSfsp1y8X0TnuJgpDk/edit#gid=0
#get file names
ls *.fq | cat
#get the total number of reads in each file:
for i in *fq; do grep ^@ $i | wc -l; done
#get reads with "cor" tag
for i in *fq; do grep "cor" $i | wc -l; done
#get reads with "unfixable" tag
for i in *fq; do grep "unfixable" $i | wc -l; done
After this I ran the script FilterUncorrectabledPEfastq.py to remove the unfixable reads. This outputs files with names: unfixrm*cor.fq
3. Trim galore
After running rcorrector and removing unfixable reads, I am running Trim_galore to trim adapter sequences and erroneous k-mers from the sequences. Here is the bash script I used for running trim galore on the cluster:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 150:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J trimgalore
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load cutadapt
module load fastqc
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trim_galore/
for prefix in $(ls /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/rcorrector/unfixrm_*_001.cor.fq | sed -r 's/_R[12]_001[.]cor.fq//' | uniq)
do
/uufs/chpc.utah.edu/sys/pkg/trim_galore/trim_galore --dont_gzip --paired --phred33 --length 36 -q 5 --stringency 1 -e 0.1 "${prefix}_R1_001.cor.fq" "${prefix}_R2_001.cor.fq"
done
I then ran fastqc on the output files and rechecked the statistics and summary (see step 1 above). I will now move on to the next step to assemble the de novo transcriptome using Trinity.
4. Trinity
I am running the denovo transcriptome assembly on the trimmed sequences using Trinity. Here is the bash script to run the program:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J trinity
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load samtools
module load bowtie2
module load salmon
module load jellyfish
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trinity/
/uufs/chpc.utah.edu/sys/installdir/trinity/2.6.6/Trinity --seqType fq --SS_lib_type RF --max_memory 30G --min_kmer_cov 1 --min_contig_length 150 --left ../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac17_TGACCA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac28_CAGATC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac31_GATCAG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac69_CGTACG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac6_ATCACG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac73_CACCGG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac7_CGATGT_L00M_R1_001.cor_val_1.fq --right ../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac17_TGACCA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac28_CAGATC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac31_GATCAG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac69_CGTACG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac6_ATCACG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac73_CACCGG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac7_CGATGT_L00M_R2_001.cor_val_2.fq
5a. Check assembly quality using BOWTIE
I checked the quality of the Trinity assembly using bowtie alignments. I first build an index for bowtie alignment using the following bash script:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J bowtie
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load bowtie2
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trinity/bowtie/
bowtie2-build --threads 4 ../trinity_out_dir/Trinity.fasta cmac
I then aligned the sequences back to this denovo assembly to check alignment statistics. Here is the script for alignment:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J trinity
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load samtools
module load bowtie2
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trinity/bowtie
bowtie2 -p 16 --local --no-unal -x cmac -q -1 ../../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac17_TGACCA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac28_CAGATC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac31_GATCAG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac69_CGTACG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac6_ATCACG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac73_CACCGG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac7_CGATGT_L00M_R1_001.cor_val_1.fq -2 ../../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac17_TGACCA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac28_CAGATC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac31_GATCAG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac69_CGTACG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac6_ATCACG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac73_CACCGG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac7_CGATGT_L00M_R2_001.cor_val_2.fq | samtools view -Sb - | samtools sort -no - - > bowtie2.nameSorted.bam
I then checked the quality of the alignment using samtools flagstat:
Usage:
samtools flagstat bowtie2.nameSorted.bam
Here is the alignment statistics. I think the assembly looks great here!
839212261 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
839212261 + 0 mapped (100.00% : N/A)
839212261 + 0 paired in sequencing
419846064 + 0 read1
419366197 + 0 read2
819082020 + 0 properly paired (97.60% : N/A)
838295612 + 0 with itself and mate mapped
916649 + 0 singletons (0.11% : N/A)
13815710 + 0 with mate mapped to a different chr
4747015 + 0 with mate mapped to a different chr (mapQ>=5)
6. Running STAR for sequence alignments
I downloaded the data from Goran's paper to get the gff files which are in this folder: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref
I used this genome assembly (fasta file) and gff file for the the STAR aligment.
I then started a trial run for STAR.
Directory for STAR: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star
Results of first run are saved in star_results_run1
Results of second run are saved in star_results_run2 (not very different just output bam file)
Here is the bash script for running the STAR index:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J star-index
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load star
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star/star_results_run1/
#with gff file
STAR --runMode genomeGenerate --runThreadN 12 --limitSjdbInsertNsj 300000 --genomeDir cmacgenome_starindex --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfile /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref/GCA_900659725.1_ASM90065972v1_genomic.gff --genomeFastaFiles /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref/GCA_900659725.1_ASM90065972v1_genomic.fasta
Here is the bash script to run the first run of STAR alignment (runStarAlign1.sh):
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J star-align
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load star
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star/
for prefix in $(ls ../trimgalore/unfixrm_*_001.cor_val_*.fq | sed -r 's/_R[12]_001[.]cor_val_[12].fq//' | uniq)
do
STAR --runThreadN 12 --runMode alignReads --genomeDir ./cmacgenome_starindex --sjdbGTFfile /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref/GCA_900659725.1_ASM90065972v1_genomic.gtf --readFilesIn "${prefix}_R1_001.cor_val_1.fq" "${prefix}_R2_001.cor_val_2.fq" --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM
done
Here is the bash script to run the first run of STAR alignment (runStarAlign1.sh):
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 300:00:00
#SBATCH -p usubio-kp
#SBATCH -A usubio-kp
#SBATCH -J star-align
#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL
#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent
module load star
cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star/
for prefix in $(ls ../trimgalore/unfixrm_*_001.cor_val_*.fq | sed -r 's/_R[12]_001[.]cor_val_[12].fq//' | uniq)
do
STAR --runThreadN 12 --runMode alignReads --sjdbFileChrStartEnd ./star_results_run1/SJ.out.tab --genomeDir ./cmacgenome_starindex --readFilesIn "${prefix}_R1_001.cor_val_1.fq" "${prefix}_R2_001.cor_val_2.fq" --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts
done
The output files generated from second run are:
Aligned.sortedByCoord.out.bam
Log.final.out --> has the final run statistics
Log.out
Log.progress.out
ReadsPerGene.out.tab
SJ.out.tab
To check the alignment statistics I did:
module load samtools
samtools flagstat Aligned.sortedByCoord.out.bam
The results of this are:
64079449 + 0 in total (QC-passed reads + QC-failed reads)
26409432 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
64079449 + 0 mapped (100.00% : N/A)
37670017 + 0 paired in sequencing
18835451 + 0 read1
18834566 + 0 read2
37668530 + 0 properly paired (100.00% : N/A)
37668530 + 0 with itself and mate mapped
1487 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I noticed this is different from the bowtie alignments with de novo assembly above. These are the output files I am using for CUFFLINKS.
Note about RSEM: I was trying to run RSEM but it gives me a lot of errors due to the GTF file for some reason. Either way I decided t go ahead and use cufflinks for quantification.
7. Running FEATURECOUNTS for transcript quantification