EasyMAP: A user-friendly online platform for analyzing 16S ribosomal DNA sequencing data
(impact factor: 6.490, journal ranking: 11%)
As next-generation sequencing technology has become more advanced, research on microbial 16S ribosomal DNA sequences has developed rapidly. Sequencing of 16S ribosomal DNA allows the composition of bacteria and archaea in a sample to be obtained and many analytical tools related to 16S ribosomal DNA sequences have been proposed; however, most do not include a user-friendly platform with a graphical user interface. Here, a comprehensive and easy-to-use online platform, Easy Microbiome Analysis Platform (EasyMAP), has been developed for analysis of 16S ribosomal DNA sequencing data. EasyMAP integrates the QIIME2, LefSe, and PICRUSt pipelines and includes temporal profiling analysis. Users can perform quality checks, taxonomy dif- ferential abundance analysis, microbial gene function prediction and longitudinal analysis with step-by-step guidance. EasyMAP is a user-friendly tool for comprehensive analysis of 16S ribosomal DNA sequencing data. The web server and documentation are freely available at http://easymap.cgm.ntu.edu.tw/.
MiDSystem: A comprehensive online system for de novo assembly and analysis of microbial genomesa
The substantial reduction in experimental cost of next-generation sequencing techniques makes it feasible to assemble a bacterial genome of unknown species de novo and acquire substantial genetic information from environmental samples. Many bioinformatics tools and algorithms have also been developed for prokaryotes, but complex parameter settings and command line-based user interfaces cause a significant entry barrier for novices. Efficient construction of pipelines that integrate all the available genomic data poses a major challenge to the understanding of unknown pathogens. MiDSystem is a comprehensive online system for analyzing genomic data from microbiomes. With a user-friendly interface, MiDSystem supports both de novo assembly and metagenomic analysis pipelines. It is designed to automatically analyze whole genome shotgun sequencing data of bacteria submitted by users. Multiple analytical steps can be performed directly on the system, and the results generated from the embedded tools are visualized in an online summary report to make it more interpretable. Constructing a genome de novo has gradually become the foundation of bacterial studies. Taking both single species and metagenomic samples into consideration, MiDSystem can greatly reduce the time and effort for analysis of bacterial genomic data. Use of MiDSystem will enable more focus to be placed on understanding the etiology of bacterial infections and microorganism ecologies.
To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences, (impact factor: 6.698, journal ranking: 11%)
Taxonomic assignment is a vital step in the analytic pipeline of bacterial 16S ribosomal RNA (rRNA) sequencing. Over the past decade, most research in this field used next-generation sequencing technology to target V3∼V4 regions to analyze bacterial composition. However, focusing on only one or two hypervariable regions limited the taxonomic resolution to the species level. In recent years, third-generation sequencing technology has allowed researchers to easily access full-length prokaryotic 16S sequences and presented an opportunity to attain greater taxonomic depth. However, the accuracy of current taxonomic classifiers in analyzing 16S full-length sequence analysis remains unclear.
The purpose of this study is to compare the accuracy of several widely-used 16S sequence classifiers and to indicate the most suitable 16S training dataset for each classifier.
Both curated 16S full-length sequences and cross-validation datasets were used to validate the performance of seven classifiers, including QIIME2, mothur, SINTAX, SPINGO, Ribosomal Database Project (RDP), IDTAXA, and Kraken2. Different sequence training datasets, such as SILVA, Greengenes, and RDP, were used to train the classification models.
The accuracy of each classifier to the species levels were illustrated. According to the experimental results, using RDP sequences as the training data, SINTAX and SPINGO provided the highest accuracy, and were recommended for the task of classifying prokaryotic 16S full-length rRNA sequences.
The performance of the classifiers was affected by sequence training datasets. Therefore, different classifiers should use the most suitable 16S training data to improve the accuracy and taxonomy resolution in the taxonomic assignment.
16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences
Analyzing 16S ribosomal RNA (rRNA) sequences allows researchers to elucidate the prokaryotic composition of an environment. In recent years, third-generation sequencing technology has provided opportunities for researchers to perform full-length sequence analysis of bacterial 16S rRNA. RDP, SILVA, and Greengenes are the most widely used 16S rRNA databases. Many 16S rRNA classifiers have used these databases as a reference for taxonomic assignment tasks. However, some of the prokaryotic taxonomies only exist in one of the three databases. Furthermore, Greengenes and SILVA include a considerable number of taxonomies that do not have the resolution to the species level, which has limited the classifiers’ performance. In order to improve the accuracy of taxonomic assignment at the species level for full-length 16S rRNA sequences, we manually curated the three databases and removed the sequences that did not have a species name. We then established a taxonomy-based integrated database by considering both taxonomies and sequences from all three 16S rRNA databases and validated it by a mock community. Results showed that our taxonomy-based integrated database had improved taxonomic resolution to the species level. The integrated database and the related datasets are available at https://github.com/yphsieh/ItgDB.