Software

12.  A microbial causal mediation analytic tool for health disparity and applications in body mass index

Description: Emerging evidence suggests the potential mediating role of microbiome in health disparities. However, no analytic framework can be directly used to analyze microbiome as a mediator between health disparity and clinical outcome, due to the non-manipulable nature of the exposure and the unique structure of microbiome data, including high dimensionality, sparsity, and compositionality. Considering the modifiable and quantitative features of the microbiome, we propose a microbial causal mediation model framework, SparseMCMM_HD, to uncover the mediating role of microbiome in health disparities, by depicting a plausible path from a non-manipulable exposure (e.g., ethnicity or region) to the outcome through the microbiome. The proposed SparseMCMM_HD rigorously defines and quantifies the manipulable disparity measure that would be eliminated by equalizing microbiome profiles between comparison and reference groups and innovatively and successfully extends the existing microbial mediation methods, which are originally proposed under potential outcome or counterfactual outcome study design, to address health disparities.


The interactive web app 

Reference:

Wang C, Segal L, Hu J, Zhou B, Hayes R, Ahn J, and Li H. (2022) Microbial Risk Score for Capturing Microbial Characteristics, Integrating Multi-omics Data, and Predicting Disease Risk. Microbiome. 


11.  STEMSIM: a simulator of within-strain short-term evolutionary mutations for longitudinal metagenomic data 

Motivation: As the resolution of metagenomic analysis increases, the evolution of microbial genomes in longitudinal metagenomic data has become a research focus. Some software has been developed for the simulation of complex microbial communities at the strain level. However, the tool for simulating within-strain evolutionary signals in longitudinal samples is still lacking. Results: In this study, we introduce STEMSIM, a user-friendly command-line simulator of short-term evolutionary mutations for longitudinal metagenomic data. The input is simulated longitudinal raw sequencing reads of microbial communities or single species. The output is the modified reads with within-strain evolutionary mutations and the relevant information of these mutations. STEMSIM will be of great use for the evaluation of analytic tools that detect short-term evolutionary mutations in metagenomic data. 


Availability: STEMSIM and its tutorial are freely available online at https://github.com/BoyanZhou/STEMSim. 

Reference: Zhou B, Li H. (2023) STEMSIM: a simulator of within-strain short-term evolutionary mutations for longitudinal metagenomic data, Bioinformatics.  


10.  An integrated strain-level analytic pipeline utilizing longitudinal metagenomic data (LongStrain)

Description: The development of sequencing technology and analytic tools have advanced our insights into the complexity of microbiome. Since different strains within species may display great phenotypic variability, studying within-species variations enhances the understanding of microbial biological processes. However, most existing methods for strain-level analysis do not allow for the simultaneous interrogation of strain proportions and genome-wide variants in longitudinal metagenomic samples. In this study, we introduce LongStrain, an integrated pipeline for the analysis of metagenomic data from individuals with longitudinal or repeated samples. Our algorithm improves the efficiency and accuracy of strain identification by jointly modeling the strain proportion and genomic variants in combined multiple samples within individuals. With simulation studies of a microbial community and single species, we show that LongStrain is superior to three extensively used methods in variant calling and proportion estimation. Furthermore, we illustrate the potential applications of LongStrain in the real data analysis of The Environmental Determinants of Diabetes in the Young (TEDDY) study and a gastric intestinal metaplasia microbiome study. We investigate the association between the dynamic change of strain proportions and early life events, such as birth delivery mode, antibiotic treatment, and weaning. By joint analysis of phylogeny and strain transition, we also identify a subspecies clade of Bifidobacterium longum which is significantly correlated with breastfeeding. 

R package:

Download

Reference:

Zhou B, Wang C, Putzel G, Hu J, Liu M, Wu F, Chen Y, Pironti A, Li H. An integrated strain-level analytic pipeline utilizing longitudinal metagenomic data. bioRxiv. 2022 Jan 1.  Under review 

9.  Microbial Risk Score (MRS) for Capturing Microbial Characteristics, Integrating Multi-omics Data, and Predicting Disease Risk

Description: Motivated from the polygenic risk score framework, we propose a microbial risk score (MRS) framework to aggregate the complicated microbial profile into a summarized risk score that can be used to measure and predict disease susceptibility. Specifically, the MRS algorithm involves two steps: 1) identifying a sub-community consisting of the signature microbial taxa associated with disease, and 2) integrating the identified microbial taxa into a continuous score. The first step is carried out using the existing sophisticated microbial association tests and pruning and thresholding method in the discovery samples. The second step constructs a community-based MRS by calculating alpha diversity on the identified sub-community in the validation samples. Moreover, we propose a multi-omics data integration method by jointly modeling the proposed MRS and other risk scores constructed from other omics data in disease prediction. 


R package:

Download

Reference:

Wang C, Segal L, Hu J, Zhou B, Hayes R, Ahn J, and Li H. (2022) Microbial Risk Score for Capturing Microbial Characteristics, Integrating Multi-omics Data, and Predicting Disease Risk. Microbiome. 


8.  ARZIMM:  A Novel Analytic Platform for the Inference of Microbial Interactions and Community Stability from Longitudinal Microbiome Study

Description: Dynamic changes of microbiome communities may play important roles in human health and diseases. The recent rise in longitudinal microbiome studies calls for statistical methods that can model the temporal dynamic patterns and simultaneously quantify the microbial interactions and community stability. Here, we propose a novel autoregressive zero-inflated mixed-effects model (ARZIMM) to capture the sparse microbial interactions and estimate the community stability. ARZIMM employs a zero-inflated Poisson autoregressive model to model the excessive zero abundances and the non-zero abundances separately, a random effect to investigate the underlining dynamic pattern shared within the group, and a Lasso-type penalty to capture and estimate the sparse microbial interactions. Based on the estimated microbial interaction matrix, we further derive the estimate of community stability, and identify the core dynamic patterns through network inference. Through extensive simulation studies and real data analyses we evaluated ARZIMM in comparison with the other methods. 

R package:

Download

Reference:

He L, Wang C, Hu J, Gao Z, Falcone E, Holland SM, Blaser MJ, Li H. ARZIMM: A Novel Analytic Platform for the Inference of Microbial Interactions and Community Stability from Longitudinal Microbiome Study. Front Genet. 2022 Feb 25;13:777877. doi: 10.3389/fgene.2022.777877. P


7.  Microbial trend analysis (MTA)

Description: We propose a microbial trend analysis (MTA) framework for the high-dimensional and phylogenetically-based longitudinal microbiome data. In particular, MTA can perform three tasks: 1) capture the common microbial dynamic trends for a group of subjects at the community level and identify the dominant taxa; 2) examine whether or not the microbial overall dynamic trends are significantly different between groups; 3) classify an individual subject based on its longitudinal microbial profiling. 

R package:

Download

Reference:

Wang C, Hu J, Blaser MJ, and Li H. (2021) Microbial trend analysis for common dynamic trend, group comparison and classification, BMC Genomics. 22, Article number:667 

6.  Joint modeling of zero-inflated longitudinal proportions and time-to-event data  (JointMM)   

Description: We propose a novel joint modeling framework [JointMM], which is comprised of two sub-models: a longitudinal sub-model called zero-inflated scaled-beta generalized linear mixed-effects regression to depict the temporal structure of microbial proportions among subjects; and a survival sub-model to characterize the occurrence of an event and its relationship with the longitudinal microbiome proportions. JointMM is specifically designed to handle the zero-inflated and highly skewed longitudinal microbial proportion data and examine whether the temporal pattern of microbial presence and/or the nonzero microbial proportions are associated with differences in the time to an event. The longitudinal sub-model of JointMM also provides the capacity to investigate how the (time-varying) covariates are related to the temporal microbial presence/absence patterns and/or the changing trend in nonzero proportions. 

R package :

Download

Reference:

Hu J, Wang C, Blaser B, and Li H. (2021) JointMM: joint modeling of zero-inflated longitudinal proportions and time-to-event data with application to a gut microbiome study. 2021 July 02. Biometrics.



5.  A rigorous Sparse Microbial Causal Mediation Model (SparseMCMM)   

Description: We propose a rigorous Sparse Microbial Causal Mediation Model (SparseMCMM) specifically designed for the high dimensional and compositional microbiome data in a typical three-factor (treatment, microbiome and outcome) causal study design. In particular, linear log-contrast regression model and Dirichlet regression model are proposed to estimate the causal direct effect of treatment and the causal mediation effects of microbiome at both the community and individual taxon levels. Regularization techniques are used to perform the variable selection in the proposed model framework to identify signature causal microbes. Two hypothesis tests on the overall mediation effect are proposed and their statistical significance is estimated by permutation procedures. Extensive simulated scenarios show that SparseMCMM has excellent performance in estimation and hypothesis testing.  

R package (updated on August 27, 2018):

Download

Reference:

Wang C, Hu J, Blaser MJ, Li H. (2020) Estimating and testing the microbial causal mediation effect with high-dimensional and compositional microbiome data. Bioinformatics. 2020 Jan 15;36(2):347-355.


4.  A two-stage microbial association mapping framework with advanced FDR control (massMap)

Description: We propose a two-stage microbial association mapping framework (massMap) which uses grouping information from the taxonomic tree to strengthen statistical power in association tests at the target rank. MassMap first screens the association of taxonomic groups at a pre-selected higher taxonomic rank using a powerful microbial group test OMiAT. The method then proceeds to test the association for each candidate taxon at the target rank within the significant taxonomic groups identified in the first stage. Hierarchical BH (HBH) and selected subset testing (SST) procedures are evaluated to control the FDR for the two-stage structured tests.

R package (updated on August 27, 2018):

Download

Manual

Readme

Reference:

Hu J, Koh H, He L, Liu M, Blaser MJ, and Li H. (2017) A two-stage microbial association mapping framework with advanced FDR control. Microbiome. 2018 6:131

3. Optimal Microbiome-based Survival Analysis (OMiSA)

Description: This software package provides facilities for 1) Optimal Microbiome-based Survival Analysis (OMiSA), 2) Optimal Microbiome-based Survival Analysis using Linear and Non-linear bases of OTUs (OMiSALN), and 3) Optimal Microbiome Regression-based Kernel Association Test for Survival traits (OMiRKAT-S). OMiSA, OMiSALN, and OMiRKAT-S test the association between microbial composition with survival (time-to-event) outcomes on health and disease. For the microbial composition, the entire community (e.g., kingdom) or indivisual upper-level taxa (e.g., phylum, class, order, family, and genus) can be surveyed.

R package (updated on Feb. 10, 2018):

Download

Manual

Reference: Koh, H,  Livanos, AE, Blaser, MJ, and Li, H. A highly adaptive microbiome-based association test for survival traits. BMC Genomics. 19, Artical number: 210 (2018) 

2. Microbial Interdependence Association Test--a Non-parametric Microbial Interdependence Test (NMIT). 

Description: This software package performs a multivariate distance-based test for group comparison of microbial temporal interdependence. The NMIT test provide a comprehensive way to evaluate the association between key phenotypic variables and microbial interdependence.

R package:

Download

Manual

Python code:

Download

Reference: Zhang, Y., Han, SW, Cox, LM and Li, H.  A Multivariate Distance-based Analytic Framework for Microbial Interdependence Association Test in Longitudinal Study. Genetic Epidemiology.  2017 Sep 5. doi: 10.1002/gepi.22065..

1. Optimal Microbiome-based Association Test (OMiAT) and Microbiome Comprehensive Association Mapping (MiCAM)

Description: This software package provides facilities 1) for optimal microbiome-based association test (OMiAT) to test association for a microbial group (e.g., the entire microbial community or individual upper-level taxa) and 2) for microbiome comprehensive association mapping (MiCAM) to test association for all microbial taxa through a breadth of taxonomic ranks (e.g., kingdom, phylum, class, order, family, genus, and species).

R package (updated on May 23, 2017):

Download

Manual

Reference: Koh, H., Blaser, MJ, and Li, H.  A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping.  Microbiome.  2017;5:45.


to a gut microbiome study. 2021 July 02. Biometrics.