Slides will be available during or after the workshop at www.cs.tufts.edu/~soha/MMM2019/
If you attended the workshop, please complete this form for the workshop records:
To register: either register for the conference, or a one day registration (see https://acm-bcb.org/2019/index.php?page=attending#top)
To join SIG BIO, the annual fee is $25. You can join here:
Motivation and rationale for the workshop
Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host’s development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health.
Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta'omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities.
Through invited talks, this workshop will highlight recent advances computational methods for metagenomics and metabolomics.
Workshop Schedule -- See details of talks at the bottom of the page
Soha Hassoun is Professor and past chair of the Department of Computer Science at Tufts University. She has adjunct appointments in the Departments of Chemical and Biological Engineering and Electrical and Computer Engineering. Her research interest is in using machine learning for metabolomics and its applications for analyzing the gut microbiota.
Yasser El-Manzalawy is an Assistant Professor at Geisinger Health System and an Adjunct Assistant Professor at Penn State University. His current research interests focus on the development of novel methodologies, frameworks, and algorithms for integrative analysis of heterogeneous data sources in EHR (including genomics, omics, microbiome, imaging, environmental, and wearables) relevant for precision medicine.
Georg Gerber, MD, PhD, MPH is an Assistant Professor of Pathology at Harvard Medical School. He is also Chief of Computational Pathology, Co-Director of the Massachusetts Host-Microbiome Center and a practicing Pathologist at Brigham and Women’s Hospital. His research interests involve developing novel computational and experimental methods to further the understanding of the role of the microbiota in human diseases. His lab has a particular focus on Bayesian inference methods for analyzing and predicting temporal dynamics of the microbiome.
David Koslicki is an Associate Professor at Pennsylvania State University. His current research interests include algorithm development in computational biology, with a focus on improving techniques for the inference of organisms in a microbial community. He is currently a PI on an NSF grant aimed at developing fast, efficient methods for the analysis of metagenomic communities.
Gail Rosen is an Associate Professor in Electrical and Computer Engineering at Drexel University. She heads the Center for Biological Discovery from Big Data and is chair of Drexel University Research Computing Facility. Her group is interested in genotype to phenotype prediction from the microbiome.
Abstracts for Talks, alphabetically by last name of first author
Department of Biochemistry and Microbiology and Department of Genetics
Title: Importance of functional fitness: microb(iom)ial functional distances describe environmental preferences
How does the environment drive selection of its inhabitants? Molecular functional abilities of individual microbes and micriobiomes living in different conditions, e.g. temperature or salinity or even the sick or healthy individual gut, are clearly different. Understanding the functions encoded in the (meta)genomes of microbiomes is thus vital for mapping their environmental preferences. The emergence of high-throughput genomic sequencing, coupled with the growing computational resources, has unlocked new horizons. However, making sense of this deluge of data requires efficient and accurate analytical techniques. Here, I will demonstrate how adopting the functional fitness point of view in analyzing microbial genomic and metagenomic data, facilitates identifying condition (or niche) -specific signature functions.
Michael J. Buck, Daniel McSkimming, Vijaya Murugaiyan, and Buffalo OsteoPerio Microbiome Group.
Associate Professor, State University of New York at Buffalo
Dept. of Biochemistry
Dept. of Biomedical Informatics
NY State Center of Excellence in Bioinformatics and Life Sciences
Title: Lessons learned from processing 4,302 microbiome samples
We are born consisting not only of our own eukaryotic human cells, but over the first few days of our life, our skin surface, oral cavity and gut are colonized by a tremendous diversity of bacteria, archaea, fungi, and viruses - a new microbial ecosystem defined as the human microbiota. Under normal circumstances, these microbes help in human well-being, but dysbiosis of the human microbiota has been linked to various diseases. Modern high throughput sequencing and bioinformatics tools provide a powerful means of understanding the contribution of human microbiome to health and its potential as a target for therapeutic interventions. To accurately access the human microbiome from various samples we have validated methodology and analysis approaches allowing the accurate measurement of microbiome samples across thousands of samples. We present several key steps in the experimental and computational procedure where technical variability is often introduced and provide quality control procedures to minimalize the affects.
Department of Computer Science
Title: Creating extended metabolic models to enhance annotation of metabolomics measurements
Quantifying and characterizing thousands of small molecules collected through untargeted metabolomics is challenging. A particular mass may be associated with multiple chemical compounds. Further, a molecule’s spectral signature is not unique and varies depending on the parameters associated with data collection. Despite progress, metabolite annotation (assigning chemical identities to metabolomics measurements) and data interpretation remains a challenge. This presentation will first describe how to create Extended Metabolic Models (EMMs) that contain not only canonical substrates and products of enzymes already cataloged for an organism, but also metabolites that can form due to substrate promiscuity. The presentation will then describe a workflow called EMMA (EMM Annotation) to annotate metabolomics measurements. We show the effectiveness of EMMA in identifying putative metabolite identities for a micrbiota data set.
Georg K. Gerber, MD, PhD, MPH, FASCP
Department of Pathology
Assistant Professor of Pathology, Harvard Medical School
Co-Director, Massachusetts Host-Microbiome Center
Member of the Harvard-MIT Health Sciences & Technology Faculty
Associate Pathologist, Center for Advanced Molecular Diagnostics, Brigham and Women’s Hospital
Title: Predictive and interpretable Bayesian machine learning models for understanding microbiome dynamics
The human microbiome is highly dynamic on multiple timescales, changing dramatically during development of the gut in childhood, with diet, or due to medical interventions. I will present two novel Bayesian machine learning methods that we have developed for gaining insight into microbiome dynamics. The first, MDSINE (Microbial Dynamical Systems INference Engine), is a method for efficiently inferring dynamical systems models from microbiome time-series data and predicting temporal behaviors of the microbiota. The second, Microbiome Interpretable Temporal Rule Engine (MITRE), is a method for predicting host status from microbiome time-series data, which achieves high accuracy while maintaining interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes.
Departments of Computer Science & Engineering and Biology
Pennsylvania State University
Title: Multi-resolution k-mer classification of metagenomic samples
I will describe some recent advances in the analysis of metagenomic data by detailing a new hashing-based method for the taxonomic profiling of samples. This k-mer based technique, called “containment min hash” or CMash, utilizes a variety of probabilistic data analysis techniques (such as HyperLogLog, MinHash, and Bloom filters) that allow for multiple k-mer sizes to be utilized in a low memory, single, streaming pass of the data. After demonstrating that CMash enables the rapid detection of small, low abundance microorganisms in a metagenomic sample, I will also describe rigorous performance guarantees for this method as well as a comparison to other taxonomic profiling methods.
Oregon State University
Title: Wasserstein Interpretations of the UniFrac Metric for Analysis and Efficient Computation
The UniFrac metric has been very successfully employed to measure differences between microbial communities. It has been shown that underlying this biological metric is the mathematical construct known as the Wasserstein or Kantorovich-Rubenstein metric. Viewing UniFrac from this mathematical perspective yields a simple linear algebra formulation which eases computation and allows for novel interpretations of UniFrac. We will discuss some algorithms which utilize this perspective and the results which follow.
University of California, San Diego
Title: Phylogeny-guided Data Augmentation for Mircobiome Sample
Learning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, given high dimensional and unbalanced nature of training data available for microbiome studies, machine learning methods can fail. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. We present a new data augmentation method, called TADA, that uses a statistical generative model to create new samples augmenting existing ones. On two datasets, we show that adding these synthetic samples to the training set improves the accuracy of classification, especially when the training data have an unbalanced representation of classes.
Title: Uncovering hidden functional patterns across diverse study populations from whole metagenome sequencing reads with Carnelian
Microbial populations exhibit functional changes in response to different ambient environments. Comparative metagenomic studies needed to understand these changes are yet to take full advantage of the increasingly available whole metagenome sequencing data. In this talk, I will introduce an end-to-end pipeline for metabolic functional profiling of metagenomic reads which is uniquely suited to finding common functional trends across data sets from diverse populations. Our software, Carnelian, combines probabilistic open reading frame finding with a new way of performing alignment-free functional profiling and is better able to detect the enzyme commission terms (ECs), especially from non-annotated species. I will show how this ability enables Carnelian to find concordant functional dysbiosis in geographically separated disease cohorts as well as uncover hidden functional relatedness of healthy microbiomes in populations with different subsistence strategies.
Electrical and Computer Engineering
Title: Learning Important Microbiome Taxa and Sequence Features
With microbiome surveys, thousands of species are characterized in samples. However, deciphering how these species co-occur and correlate to sample features (a.k.a metadata) is a challenge. In this talk, we offer an insight into genotype-to-"phenotype" approaches for microbiome research. We survey two approaches to this problem -- (1) a topic model and (2) a deep neural network. While the former is able to find ``guilds" and offer interesting ways to explore multi-sample studies, especially over time, the latter can be used to simultaneously find which organisms and which features may be important.
Department of Biochemistry and Microbiology
Tutorial Title: Sequencing, function and environment - a walk through of microbial (meta)genome analysis.
High throughput sequencing offers new ways to investigate environmental microorganisms. The challenges include efficiently annotating data at large scale, the reliability of the annotation, and the interpretation of the results. In this tutorial I will walk through tools and resources from our lab dedicated to address the aforementioned issues. The participants are expected to leave the tutorial with solid understanding of the analysis procedure as well as more confidence to use bioinformatics tools in general.
Department of Computer Science and Engineering
Department of Biomedical Informatics
University at Buffalo
Title: Towards Fully Mobile and Real-time Metagenomics
The new portable and relatively inexpensive DNA sequencers, such as Oxford Nanopore MinION, have the potential to move DNA sequencing outside of laboratory, leading to faster and more accessible DNA-based diagnostics. However, portable metagenomic DNA sequencing and analysis are challenging for mobile systems, owing to high data throughput and computationally intensive processing performed in environments with unreliable connectivity and power. In this talk, we provide an analysis of the challenges that mobile systems must address to maximize the potential of fully mobile and real-time metagenomics. We identify primary bottlenecks and then show potential solutions from the perspective of both algorithms and systems design.
If you would like to present a poster on a topic relevant to this workshop, please email email@example.com with your name, title, affiliation, poster title and abstract. If we receive a critical mass of posters, we will gladly host a poster session.