transcriptome_analysis

10/01/2019

I have started the transcriptome assembly.

Here is the folder where I am working on this assembly: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome

1. FastQC

Ran FastQC to check quality of sequences (folder:/uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/fastqc/) . I did this for both our data and the downloaded transcriptomic data. I created an output directory for both these data so that the files are stored separately.

Directories:

Our data: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/fastqc/

I unzipped the data first: for f in *.gz; do gunzip $f; done

Here is the bash script I used:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J fastqc

module load fastqc

for f in /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/RNAseq20151015/*.fastq; do fastqc –o "./" $f; done

Fastqc outputs two sets of files for each sample: 1). html file 2). Zip file with all the statistics

I used the following bash code to compile the summaries and statistics from the zip folders for each sample. The output summaries are saved in the folder fq_aggregate.

# Run this script in a directory containing zip files from fastqc. It aggregates images of each type in individual folders

# So looking across data is quick. Output will be saved in the folder: fq_aggregate

zips=`ls *.zip`

for i in $zips; do

unzip -o $i &>/dev/null;

done

fastq_folders=${zips/.zip/}

rm -rf fq_aggregated # Remove aggregate folder if present

mkdir fq_aggregated

# Rename Files within each using folder name.

for folder in $fastq_folders; do

folder=${folder%.*}

img_files=`ls ${folder}/Images/*png`;

for img in $img_files; do

img_name=$(basename "$img");

img_name=${img_name%.*}

new_name=${folder};

mkdir -p fq_aggregated/${img_name};

mv $img fq_aggregated/${img_name}/${folder/_fastqc/}.png;

done;

# Concatenate Summaries

for folder in $fastq_folders; do

folder=${folder%.*}

cat ${folder}/summary.txt >> fq_aggregated/summary.txt

done;

# Concatenate Statistics

for folder in $fastq_folders; do

folder=${folder%.*}

head -n 10 ${folder}/fastqc_data.txt | tail -n 7 | awk -v f=${folder/_fastqc/} '{ print $0 "\t" f }' >> fq_aggregated/statistics.txt

rm -rf ${folder}

done

The results show some summary stats as failed. I think running rcorrector will clean this data up and I will run fastqc again after running rcorrector.

2.rcorrector

Since rcorrector is not present as a module on the UofU CHPC, I installed it locally in my directory: /uufs/chpc.utah.edu/common/home/u6007910/projects/sam

Install rcorrector:

module load git

git clone https://github.com/mourisl/rcorrector.git #downloads rcorrector repository

cd rcorrector/

make #installs and downloads jellyfish2 if it is not available in the rcorrector path

perl run_rcorrector.pl #test for installation

Running rcorrector:

mkdir rcorrector

bash script for rcorrector:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 50:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J rcorrect

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load perl

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/rcorrector/

for prefix in $(ls /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/RNAseq20151015/*.fastq | sed -r 's/_R[12]_001[.]fastq//' | uniq)

perl /uufs/chpc.utah.edu/common/home/u6007910/projects/sam/rcorrector/run_rcorrector.pl -t 12 -1 "${prefix}_R1_001.fastq" -2 "${prefix}_R2_001.fastq"

done

After rcorrector has run, we will get outfiles with extensions "fq". Use the following steps to run some statistics on these files and get numbers for how many files are tagged with "cor" or "unfixable". I saved these numbers in this file: https://docs.google.com/spreadsheets/d/1vU6Fj-MqAYnps0ksBL_gx5hS5eSfsp1y8X0TnuJgpDk/edit#gid=0

#get file names

ls *.fq | cat

#get the total number of reads in each file:

for i in *fq; do grep ^@ $i | wc -l; done

#get reads with "cor" tag

for i in *fq; do grep "cor" $i | wc -l; done

#get reads with "unfixable" tag

for i in *fq; do grep "unfixable" $i | wc -l; done

After this I ran the script FilterUncorrectabledPEfastq.py to remove the unfixable reads. This outputs files with names: unfixrm*cor.fq

3. Trim galore

After running rcorrector and removing unfixable reads, I am running Trim_galore to trim adapter sequences and erroneous k-mers from the sequences. Here is the bash script I used for running trim galore on the cluster:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 150:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J trimgalore

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load cutadapt

module load fastqc

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trim_galore/

for prefix in $(ls /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/rcorrector/unfixrm_*_001.cor.fq | sed -r 's/_R[12]_001[.]cor.fq//' | uniq)

/uufs/chpc.utah.edu/sys/pkg/trim_galore/trim_galore --dont_gzip --paired --phred33 --length 36 -q 5 --stringency 1 -e 0.1 "${prefix}_R1_001.cor.fq" "${prefix}_R2_001.cor.fq"

done

I then ran fastqc on the output files and rechecked the statistics and summary (see step 1 above). I will now move on to the next step to assemble the de novo transcriptome using Trinity.

4. Trinity

I am running the denovo transcriptome assembly on the trimmed sequences using Trinity. Here is the bash script to run the program:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J trinity

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load samtools

module load bowtie2

module load salmon

module load jellyfish

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trinity/

/uufs/chpc.utah.edu/sys/installdir/trinity/2.6.6/Trinity --seqType fq --SS_lib_type RF --max_memory 30G --min_kmer_cov 1 --min_contig_length 150 --left ../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac17_TGACCA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac28_CAGATC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac31_GATCAG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac69_CGTACG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac6_ATCACG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac73_CACCGG_L00M_R1_001.cor_val_1.fq,../trimgalore/unfixrm_cmac7_CGATGT_L00M_R1_001.cor_val_1.fq --right ../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac17_TGACCA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac28_CAGATC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac31_GATCAG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac69_CGTACG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac6_ATCACG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac73_CACCGG_L00M_R2_001.cor_val_2.fq,../trimgalore/unfixrm_cmac7_CGATGT_L00M_R2_001.cor_val_2.fq

5a. Check assembly quality using BOWTIE

I checked the quality of the Trinity assembly using bowtie alignments. I first build an index for bowtie alignment using the following bash script:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J bowtie

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load bowtie2

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trinity/bowtie/

bowtie2-build --threads 4 ../trinity_out_dir/Trinity.fasta cmac

I then aligned the sequences back to this denovo assembly to check alignment statistics. Here is the script for alignment:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J trinity

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load samtools

module load bowtie2

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/trinity/bowtie

bowtie2 -p 16 --local --no-unal -x cmac -q -1 ../../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac17_TGACCA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac28_CAGATC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac31_GATCAG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac69_CGTACG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac6_ATCACG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac73_CACCGG_L00M_R1_001.cor_val_1.fq,../../trimgalore/unfixrm_cmac7_CGATGT_L00M_R1_001.cor_val_1.fq -2 ../../trimgalore/unfixrm_cmac16_TTAGGC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac17_TGACCA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac18_ACAGTG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac19_GCCAAT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac28_CAGATC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac29_ACTTGA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac31_GATCAG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac40_TAGCTT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac41_GGCTAC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac45_CTTGTA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac58_AGTCAA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac59_AGTTCC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac63_ATGTCA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac64_CCGTCC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac65_GTCCGC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac66_GTGAAA_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac67_GTGGCC_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac68_GTTTCG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac69_CGTACG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac6_ATCACG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac70_GAGTGG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac71_ACTGAT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac72_ATTCCT_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac73_CACCGG_L00M_R2_001.cor_val_2.fq,../../trimgalore/unfixrm_cmac7_CGATGT_L00M_R2_001.cor_val_2.fq | samtools view -Sb - | samtools sort -no - - > bowtie2.nameSorted.bam

I then checked the quality of the alignment using samtools flagstat:

Usage:

samtools flagstat bowtie2.nameSorted.bam

Here is the alignment statistics. I think the assembly looks great here!

839212261 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary

0 + 0 supplementary

0 + 0 duplicates

839212261 + 0 mapped (100.00% : N/A)

839212261 + 0 paired in sequencing

419846064 + 0 read1

419366197 + 0 read2

819082020 + 0 properly paired (97.60% : N/A)

838295612 + 0 with itself and mate mapped

916649 + 0 singletons (0.11% : N/A)

13815710 + 0 with mate mapped to a different chr

4747015 + 0 with mate mapped to a different chr (mapQ>=5)

6. Running STAR for sequence alignments

I downloaded the data from Goran's paper to get the gff files which are in this folder: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref

I used this genome assembly (fasta file) and gff file for the the STAR aligment.

I then started a trial run for STAR.

Directory for STAR: /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star

Results of first run are saved in star_results_run1

Results of second run are saved in star_results_run2 (not very different just output bam file)

Here is the bash script for running the STAR index:

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J star-index

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load star

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star/star_results_run1/

#with gff file

STAR --runMode genomeGenerate --runThreadN 12 --limitSjdbInsertNsj 300000 --genomeDir cmacgenome_starindex --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfile /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref/GCA_900659725.1_ASM90065972v1_genomic.gff --genomeFastaFiles /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref/GCA_900659725.1_ASM90065972v1_genomic.fasta

Here is the bash script to run the first run of STAR alignment (runStarAlign1.sh):

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J star-align

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load star

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star/

for prefix in $(ls ../trimgalore/unfixrm_*_001.cor_val_*.fq | sed -r 's/_R[12]_001[.]cor_val_[12].fq//' | uniq)

STAR --runThreadN 12 --runMode alignReads --genomeDir ./cmacgenome_starindex --sjdbGTFfile /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/main_gff_ref/GCA_900659725.1_ASM90065972v1_genomic.gtf --readFilesIn "${prefix}_R1_001.cor_val_1.fq" "${prefix}_R2_001.cor_val_2.fq" --outSAMtype BAM SortedByCoordinate --quantMode TranscriptomeSAM

done

Here is the bash script to run the first run of STAR alignment (runStarAlign1.sh):

#!/bin/bash

#SBATCH -n 12

#SBATCH -N 1

#SBATCH -t 300:00:00

#SBATCH -p usubio-kp

#SBATCH -A usubio-kp

#SBATCH -J star-align

#SBATCH --mail-type=ALL # Type of email notification- BEGIN,END,FAIL,ALL

#SBATCH --mail-user=<samridhi.chaturvedi@gmail.com> # Email to which notifications will be sent

module load star

cd /uufs/chpc.utah.edu/common/home/gompert-group1/data/callosobruchus/Annotation/transcriptome/star/

for prefix in $(ls ../trimgalore/unfixrm_*_001.cor_val_*.fq | sed -r 's/_R[12]_001[.]cor_val_[12].fq//' | uniq)

STAR --runThreadN 12 --runMode alignReads --sjdbFileChrStartEnd ./star_results_run1/SJ.out.tab --genomeDir ./cmacgenome_starindex --readFilesIn "${prefix}_R1_001.cor_val_1.fq" "${prefix}_R2_001.cor_val_2.fq" --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts

done

The output files generated from second run are:

Aligned.sortedByCoord.out.bam

Log.final.out --> has the final run statistics

Log.out

Log.progress.out

ReadsPerGene.out.tab

SJ.out.tab

To check the alignment statistics I did:

module load samtools

samtools flagstat Aligned.sortedByCoord.out.bam

The results of this are:

64079449 + 0 in total (QC-passed reads + QC-failed reads)

26409432 + 0 secondary

0 + 0 supplementary

0 + 0 duplicates

64079449 + 0 mapped (100.00% : N/A)

37670017 + 0 paired in sequencing

18835451 + 0 read1

18834566 + 0 read2

37668530 + 0 properly paired (100.00% : N/A)

37668530 + 0 with itself and mate mapped

1487 + 0 singletons (0.00% : N/A)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

I noticed this is different from the bowtie alignments with de novo assembly above. These are the output files I am using for CUFFLINKS.

Note about RSEM: I was trying to run RSEM but it gives me a lot of errors due to the GTF file for some reason. Either way I decided t go ahead and use cufflinks for quantification.

7. Running FEATURECOUNTS for transcript quantification

Page updated

Google Sites

Report abuse