QRNAseq: for Functional Isoform, Driver Mutation Gene and Fusion Gene Detection

Introduction

QRNAseq is a tool to manage huge amounts of RNA-seq data in an integrative way which includes RNA-seq data quality control, read alignment, gene fusion, gene mutation, isoform identification and function analysis. The QRNAseq system comprises pipelines that load input, analyzes the NGS data, exports outputs into a relational database, integrates and mines the data with other data types, and generates key gene signatures of interest.


Installation

QRNAseq is implemented in C++ and the GUI is developed using QT4. Currently we provide the executable files including the functions of each module and test the QRNAseq in the 64bit Ubuntu 10.10.


Manual

Obtaining genome files

In QRNAseq, we need to align the RNA-seq data with the reference genome using Bowtie. Running Bowtie requires Bowtie index, which can be downloaded from Bowtie homepage (download the same version as your genome files, probably UCSC).

The second is the same genome in FASTA format, with each chromosome in a separate file. These can be obtained from UCSC--Human--Full data set—chromFa.tar.gz.

FusionQ will also need the structure of UCSC annotated isoforms which is downloaded from UCSC.


Functions

1. Data Preprocessing
  • Remove redundant reads
    There may be many
    duplicate reads in RNA-seq data due to PCR amplification bias. The removal of duplicate reads will make faster in downstream analysis such as alignment and junction detection.

  • Remove low quality reads
    The low quality reads should be addressed to avoid the false positive in the downstream analysis. There are two parameters required by this function which are the quality threshold and percentage of good nucleotides. If the quality of a nucleotide in a read is above the quality threshold, this nucleotide is regarded as good one. The percentage of good nucleotides is defined as the number of good nucleotides divided by the read length.

2. Gene fusion
  • FusionQ
    We developed FusionQ to detect and quantify gene fusions from RNA-seq data. The parameters for FusionQ can be found in the example *.cfg file.

3. Junction Detection

  • TopHat
    In the RNA-seq data analysis, it is important to detect the splicing junction for the following isoform inference. Here we incorporate the state-of-the-art splicing detection method TopHat which will produce the splicing for further use in NSMAP and Cufflinks.

4. Isoform level analysis

  • NSMAP:
    We developed Nonnegativity and sparsity constrained Maximum A Posteriori to infer the isoforms from RNA-seq. Currently, RNA-seq is written in the Matlab. We are rewriting it in C++ to be incorporated in QRNAseq.

  • Cufflinks
    We also include Cufflinks which is a popular tool in isoform inference.

5. Visualization

  • UCSC Browser:
    The Browser is a graphical viewer optimized to support rapid visualization, examination, and querying of the sequencing data at many levels. QRNAseq will output the analysis result into UCSC supported format to leverage the visualization ability of UCSC Browser.

Usage

Select the functions from the left available functions. The final selected functions will be listed in the right box which will be executed sequentially. The parameters of each function should be set before execution. The information during running will be displayed in the down window.



Copyright

This software is only free for academic use.


Authors:

QRNAseq is developed by Xiaobo Zhou from The Methodist Hospital Research Institute.


Contact:

If you have any questions about QRNAseq, please send an e-mail to: xzhou@tmhs.org