SmithHunter

WORK IN PROGRESS

SmithHunter is a novel unified workflow for the identification of smithRNAs and their targets.

Its functionalities and application are described in the accompanying publication:

Marturano G, Carli D, Cucini C, Carapelli A, Plazzi F, Frati F, Passamonti M, Nardi F. SmithHunter: an unified workflow for the identification of candidate smithRNAs and their targets. Submitted to BMC Bioinformatics.

Main hub to SmithHunter is its GitHub repository. Here the latest distribution can be found, alongside installation instructions and usage examples.

https://github.com/ESZlab/SmithHunther

Questions, comments and requests for help can be addressed to Dr. Giovanni Marturano at the University of Siena (giovanni.marturano@unisi.it) and Dr. Diego Carli at the University of Bologna (diego.carli2@unibo.it).

SmithHunter functionalities in brief.

The first module, named smithHunterA.sh, focuses on the identification and filtering of presumptive smithRNA sequences, defined as centroids of clusters with significant transcription levels and a narrow 5’ transcription boundary. It takes as input one or more small RNA libraries (replicates), the sequence of the mitochondrial genome and (optionally) the sequence of the nuclear genome of the species of interest. Main output is a list of presumptive smithRNA sequences, filtered based on parameters defined by the user, as well as graphics depicting: a) read coverage over the mitochondrial genome, global and per replicate; b) cluster position/abundance on the mitochondrial genome and c) 5’ and 3’ end conservation.

Key output:

Sequence (centroids) of the clusters passing filters (i.e. presumptive smithRNAs from module A). The header reports cluster name, depth, start and end with respect to the genome, strand.

smallRNA read coverage over the mitochondrial genome. If a nuclear genome is provided, remapping of all reads will show as well as remapping of uniquely mitochondrial reads (i.e. reads not remapping on the nuclear genome).

Per replica smallRNA read coverage over the mitochondrial genome. Detail of low coverage areas.

Cluster distribution over the mitochondrial genome. Clusters in forward and reverse orientation are shown in colors.

Distribution of cluster 3’ and 5’ ends, as well as coverage over the mitochondrial genome, to evaluate end conservation.

Threshold and coverage information.

The user can manually edit the presumptive smithRNA list produced by module A, adding or removing presumptive smithRNAs before running module B. More specifically, we envision that the user may manually select a) smithRNAs with more conserved 3’ and 5’ end; b) smithRNAs from specific parts of the genome; c) smithRNAs within a specific size range. An additional script, named shartp_smith.R, is distributed with SmithHunter to help perform this task, nevertheless the scoring scheme is still experimental and we do not advise a non supervised use of the script to filter presumptive smithRNAs at this stage. If you use this script, please provide feedback.

The second module, named smithHunterB.sh, deals with the identification of possible nuclear targets and pre-miRNA-like precursor structures for presumptive smithRNAs. It takes as input the list of presumptive smithRNAs identified by the first module and the transcriptome of the species of interest (fasta format), with annotated 5’ and 3’ UTR regions (bed format). Main output is a list of nuclear transcripts putatively targeted by individual smithRNAs, information regarding Gibbs Free Energy (dG) of RNA-RNA hybrids stability of smithRNA/target pairs and putative precursor structures.

Key output:

Sequence (centroids) of the clusters passing filters module A and finding a target in module B (i.e. candidate smithRNAs from module B).

List of smithRNA/target pairs identified as well as fasta sequences of the target gene(s) identified for each smithRNA. Table reports cluster name, ratget name, dG values from PITA and dG values from RNAhybrid.

Folding structure of pre-smithRNAs in .svg graphical format. Two structure are proposed, with smithRNA on the left side of the pre-smithRNA and on the right side of the pre-smithRNA.

Page updated

Report abuse