Workshop on Microbiomics, Metagenomics, and Metabolomics

@ACM BCB 2017

Boston Marriott Cambridge, Cambridge, MA

Sunday, August 20, 2017

Slides are available at

If you attended the workshop, please complete this form for the workshop records:

To register:, for $350 for attending on Sunday, if you have a SIG BIO membership.

To join SIG BIO, the annual fee is $25. You can join here:

Motivation and rationale for the workshop

Microbiota are ecological communities of microorganisms found throughout nature. In humans and animals, microbiota communities can reside on or within the body, and exist in a commensal or mutualistic relationship with their host to impact physiological functions and play critical roles in the host’s development. These microbial communities can be very complex. One such example is the intestinal microbiota, comprising hundreds of species that interact with other microorganisms in the community as well as their host. Recent studies have demonstrated that microbiota impacts a wide range of physiological processes, including digestion, development of the immune system, and inflammation. Further, significant alterations in the intestinal microbiota composition has shown to correlate with several diseases, including obesity diabetes, cancer, asthma, and even autism spectrum disorder. Characterizing the microbiota and understanding its relation to health and disease stand to significantly improve human health.

Efforts to characterize microbiota have greatly benefited from technical advances in DNA sequencing. In particular, low-cost culture-independent sequencing has made metagenomic and metatranscriptomic surveys of microbial communities practical, including bacteria, archaea, viruses, and fungi associated with the human body, other hosts, and the environment. The resulting data have stimulated the development of many new computational approaches to meta'omic sequence analysis, including metagenomic assembly, microbial identification, and gene, transcript, and pathway metabolic profiling. Further, recent advances in untargeted metabolomics have stimulated the development of many tools that enhance the functional profiling of microbial communities.

A hands-on tutorial will provide an introduction to computational metagenomics. Through invited talks, this workshop will highlight recent advances computational methods for metagenomics and metabolomics. This workshop is timely, and will broaden the scope of ACM-BCB to cover such pressing important topics.

Workshop Schedule -- See details of talks at the bottom of the page

Workshop on Microbiomics, Metagenomics, and Metabolomics, ACM BCB 2017


Soha Hassoun is Professor and past chair of the Department of Computer Science at Tufts University. She has adjunct appointments in the Departments of Chemical and Biological Engineering and Electrical and Computer Engineering. She is currently co-PI on NSF and NIH grants to identify immunomodulatory microbiota metabolites and on metabolomics analysis.

Dr. Curtis Huttenhower is an Associate Professor of Computational Biology and Biostatistics at the Harvard T.H. Chan School of Public Health and an Associate Member of the Broad Institute. He led several of the analysis and publication efforts of the Human Microbiome Project, and he is currently PI of the "HMP2" Center for Characterizing the Gut Microbial Ecosystem for Diagnosis and Therapy in IBD, as well as PI of the Crohn’s and Colitis Foundation of America and Juvenile Diabetes Research Foundation Microbiome Initiative Bioinformatics Centers. His lab is a leader in computational and bioinformatic approaches for microbial community analysis and data integration. The Huttenhower group provides a variety of computational resources for the community (, many of which are focused upon large meta'omic data mining and microbial biomolecular function prediction. All of these have all been released with open source software and service implementations. The lab has delivered interactive tutorials on the bioBakery software suite at several previous venues detailed below.

About the Hands on Tutorial

The tutorial presentations on metagenomics will be led but Prof. Huttenhower and members of his lab. The tutorial will provide an introduction to reference-based computational meta’omics, highlighting the state-of-the-art in the field as well as outstanding challenges (Fig. 1). Alternating between lecture content and hands-on activities, tutorial presentations will walk attendees through the typical steps in a meta’omic analysis workflow: 1) quality control of raw sequencing reads, 2) profiling of microbial community features (taxa and molecular functions), 3) strain identification and tracking.

The hands-on components of the tutorial will introduce the bioBakery suite of meta’omic analysis methods developed by the Huttenhower lab at the Harvard T.H. Chan School of Public Health ( Tutorial attendees will interact with the bioBakery suite through a Google Cloud implementation of the bioBakery VM: an Ubuntu linux-based environment preloaded with the bioBakery tools, their dependencies, and sample meta’omic datasets. Each attendee will be assigned to a personal Google Cloud instance of the bioBakery VM. Attendees will work on their personal laptops and interact with their instances through a web browser-based graphical interface (Fig. 1 A-C). Performing hands-on tutorials in Google Cloud removes the need for local installation of tools and data downloads, thus minimizing setup time. In addition, attendees are free to experiment in their personal instances without fear of disrupting their day-to-day computing environment ( Our Senior Software Developer (Lauren McIver) will be on-call during tutorial sessions for troubleshooting (the vast majority of issues are resolved by having an attendee log out of their personal instance and then log back in).

Each of the ~20 tools in the bioBakery suite is bundled with 1) a slide deck covering the theory behind the tool and results of example applications of the tool (to be covered during the lecture portion of the tutorials) and 2) an online demo (to be followed during the hands-on component; Fig. 1D). Each individual tutorial module occupies ~45-60 minutes (lecture + hands-on component), and an example half-day tutorial sequence would be 1) MetaPhlAn2 (taxonomic profiling of bacteria, archaea, and eukaryotic microbes); 2) StrainPhlAn (microbial strain identification and tracking), and 3) HUMAnN2 (species-stratified functional profiling of microbial genes and metabolic pathways). Online demo materials include discussion questions for instructors to review with attendees to deepen understanding of the material (Fig. 1E).

Abstracts for Talks

Soha Hassoun

Department of Computer Science

Tufts University

Title: Tutorial: Advances and Challenges in Metabolite A`nnotation

Metabolomics is an expanding field of ‘omics’ research concerned with the characterization of small molecule metabolites in biological systems. Quantifying and characterizing thousands of small molecules collected through untargeted metabolomics is challenging. A particular mass may be associated with multiple chemical compounds. Further, a molecule’s spectral signature is not unique and varies depending on the parameters associated with data collection. Despite progress, metabolite annotation and interpretation remains a challenge. This tutorial will provide an overview of tandem mass spectrometry and available databases that catalogue spectral data. The tutorial will then cover fundamental concepts utilized in recent metabolite identification techniques. This tutorial will be beneficial for researchers in systems biology, and those interested in integrating metabolomics with other ‘omics’ data and in tackling challenges enabled by novel mass spectrometry collection platforms.

Dan Knights

Biotechnology Institute

Department of Computer Science and Engineering

University of Minnesota

Title: Fast exhaustive alignment for microbiome analysis

Abstract: One of the fundamental tasks in analyzing microbiome sequencing data is genome database search, in which DNA sequences are compared to known reference genomes for identification or annotation. Although algorithms exist for optimal, exhaustive gapped alignment, these have largely been abandoned for next-generation sequencing (NGS) data in favor of faster algorithms that sacrifice alignment quality and confidence. This talk describes BURST, a high-throughput DNA short-read aligner that performs provably optimal alignment with speed up to one million times faster than the fastest optimal alignment algorithms by relying on several key novel optimizations. BURST guarantees to find all equally good matches in the database and can interpolate conservative taxonomic annotation for sequences that match multiple genomes. BURST also losslessly computes the minimal set of matching references for a given set of input sequences, and is fast enough to run on large metagenomics data sets, with applications in amplicon and shotgun DNA analysis.

Yuzhen Ye

Computer Science Department

Indiana University, Bloomington

Title of your talk: New computational tools for integrated meta-omics data analysis

Abstract: We have developed new algorithms and computational tools for integrated analysis of meta-omics data, and for identification of microbial markers for microbiome-associated diseases. In this talk, I will focus on our recent developments of graph-centric algorithms for metatranscriptomic and metaproteomic data analysis, and subtractive assembly approaches for detecting consistent microbial marker genes that can potentially be applied for disease diagnosis.

Georg K. Gerber, MD, PhD, MPH, FASCP

Department of Pathology

Assistant Professor of Pathology, Harvard Medical School

Co-Director, Massachusetts Host-Microbiome Center

Member of the Harvard-MIT Health Sciences & Technology Faculty

Associate Pathologist, Center for Advanced Molecular Diagnostics, Brigham and Women’s Hospital

Title: Predictive and interpretable Bayesian machine learning models for understanding microbiome dynamics

Abstract: The human microbiome is highly dynamic on multiple timescales, changing dramatically during development of the gut in childhood, with diet, or due to medical interventions. I will present two novel Bayesian machine learning methods that we have developed for gaining insight into microbiome dynamics. The first, MDSINE (Microbial Dynamical Systems INference Engine), is a method for efficiently inferring dynamical systems models from microbiome time-series data and predicting temporal behaviors of the microbiota. The second, Microbiome Interpretable Temporal Rule Engine (MITRE), is a method for predicting host status from microbiome time-series data, which achieves high accuracy while maintaining interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes.

Gail Rosen

Ecological and Evolutionary Signal-processing and Informatics lab

Electrical and Computer Engineering

Drexel University

Title: Discovering the Hidden World: High-throughput Discovery of Microbial Community Structure and Interactions


Recent advances in DNA sequencing have revealed that microbes are tightly coupled to our immune system and nutrient supply. They also regulate the planetary carbon and oxygen cycles that are vital to life and are found in almost every cranny on earth. However, these microbes do not act in isolation but in communities, in sometimes mutualistic, parasitic, and competitive relationships. In my talk, I will survey mathematical and computational techniques to understand what microbes compose a community and how they interact with the host environment and each other. I will show my lab's expertise in accelerating taxonomic classification using bayesian and compressive sensing (which speeds up analysis from 10-1000x depending on the data structure). I will also demonstrate our recent advances in apply information-theoretic feature selection to select species/genes/metabolic pathways differentiating different environmental factors, and our novel method to select the number of most relevant features. Also, I will present how to explore co-occurring subcommunities using structural topic modeling. In each case, I'll demonstrate the use of these methods to understand our microbiome's impact on aging, diet, bioremediation, and other microbial systems.

Kyongbum Lee

Chemical and Biological Engineering

Tufts University

Title: Using Metabolomics and 16S rRNA Sequencing to Investigate the Impact of Environmental Chemical Perturbations on Gut Microbiota Community Composition and Function

Abstract: Recent findings suggest that a significant alteration of the gut microbiota, or dysbiosis, contributes to disorders of the brain. Dysbiosis may result from infection, diet, or other environmental perturbations. One example of an environmental perturbation is exposure to biologically active synthetic chemicals present in household and commercial products. In recent years, exposure to these chemicals, labeled as endocrine disrupting compounds, have emerged as a significant public health concern. Interestingly, several of these chemicals, e.g. bisphenol A (BPA) and phthalate esters, have been linked to neurodevelopmental disorders, including autism spectrum disorder (ASD).

This presentation describes a study on the effects of di-ethylhexyl phthalate (DEHP), a ubiquitous plasticizer. The effects of this chemical were studied in an in vitro culture model of the gut microbiota. Using 16S rRNA sequencing and untargeted metabolomics, we found that DEHP significantly modifies both the microbiota community structure as well as metabolic profile. Co-analyzing the 16S and metabolomics data using a metabolic model of the microbiota revealed that the chemically induced increases and decreases in specific metabolites can be attributed to the depletion or enrichment of particular groups of bacteria. Notably, DEHP exposure significantly increased the level of p-cresol, while expanding the abundance of Clostridium bolteae, both of which have been identified recently as potential biomarkers of ASD. Our results suggest that environmental chemicals could cause significant dysbiosis of the gut microbiota leading to an altered milieu of bioactive metabolites in the intestine, consistent with other studies linking environmental chemical exposure to developmental disorders involving gastrointestinal conditions.

Hesham Ali

Professor of Computer Science & Director of UNO Bioinformatics Core Facility

Lee D. and Willa Seemann Distinguished Dean

College of Information Science and Technology

University of Nebraska at Omaha

Title: Analysis of biologically relevant features of metagenomics data in health and disease

Abstract: The microbiome consists of a wealth of bacterial species; these individuals form a dynamic, complex community that is capable of responding to and influencing its environment. The human microbiome has been a focal point of great interest, especially with its connection to patient health and disease. Next generation sequencing technologies allow for novel insight into metagenomics communities, revealing biologically relevant features that are medically important. In this research, we examine biologically relevant features in the human microbiome. First we introduce Focus, an assembler tool that is capable of modeling next generation sequencing data at multiple levels of granularity using a graph multiset. This novel multilevel graph facilitates the detection and extraction of biologically relevant features from NGS data sets. The Focus algorithm is applied to discover transposase-associated genes in Crohn’s disease and healthy gut metagenomics data sets. Results demonstrate substantial differences in transposase-associated genes in healthy and Crohn’s disease gut microbiomes. In the second part of this presentation, we examine biological features of the gut microbiome as potential early biomarkers for cancer detection. Gene and bacterial abundances are used to train machine-learning models to distinguish between healthy and cancer-associated oral microbiomes. Machine-learning SVM models are trained for two case studies, oral carcinoma and acute lymphoblastic leukemia. Results demonstrate that machine learning is successful at separating cancer-associated and healthy oral microbiomes in these two small case studies. We anticipate that as next generation sequencing costs continue to drop, more data will be available to create even more robust and accurate machine-learning models for distinguishing cancer-associated and healthy human microbiomes for early cancer detection.


If you would like to present a poster on a topic relevant to this workshop, please email with your name, title, affiliation, poster title and abstract. If we receive a critical mass of posters, we will gladly host a poster session.