Invited Seminars

Spring 2021 (March 29, 2021 - June 07, 2021)

April 7 Machine learning approaches to enhance research and equity in computational biology

Speaker: Casey Greene

Abstract: Biomedical research disciplines are awash in data. These data, generated by new technologies as well as old approaches, provide the opportunity to systematically extract biological patterns that were previously difficult to observe. I’ll share vignettes focusing on three areas: 1) why large-scale integrative analyses can be beneficial in bioinformatics; 2) how data simulation can help us avoid rediscovering generic findings; and 3) how machine learning can be used to examine the scientists whose contributions we choose to recognize.


Bio: Casey’s lab at the University of Colorado School of Medicine is dedicated to developing computational tools that biologists can use to gain insights from other labs’ data as easily as from their own. The lab’s work is heavily motivated by interests of its members, and in recent years the lab has also examined the distribution of honors by a major computational biology society, investigated preprints as a means to study the peer review process, and developed methods to promote data sharing. In 2016, Casey established the “Research Parasite Awards” after an editorial in the New England Journal of Medicine deemed scientists who analyze other scientists’ data “research parasites.” These honors, accompanied by a cash prize, are awarded to scientists who rigorously reanalyze other people’s data to learn something new. Casey is also the director of the Center for Health AI in the University of Colorado School of Medicine. This newly created center will be made up of faculty dedicated to enhancing research, clinical practice, and education with the use of advanced analytical approaches. Initial recruits to campus made through the center include Sean Davis and Melissa Haendel.


April 14 Machine-learned molecular models for protein interactions and logic

Speaker: Mohammed AlQuraishi

Abstract: The advent of differentiable programming makes possible bespoke machine-learned models of biological phenomena that are partly learned from data and partly informed by human-derived biophysical knowledge. In this talk I will describe a new approach for predicting protein-protein interactions that provides quantitative accuracy in modeling interaction affinity and sensitivity to single-residue variation. Using this approach we have begun to uncover what appears to be a combinatorial grammar underlying signal transducing proteins in metazoa. This grammar describes how modular protein interaction domains and their cognate binding sites combine to form units of functional logic that are widely reused throughout signaling networks, providing a new conceptual paradigm for the organization of signaling networks.


Bio: Mohammed AlQuraishi is an Assistant Professor in the Department of Systems Biology and a member of Columbia’s Program for Mathematical Genomics, where he works at the intersection of machine learning, biophysics, and systems biology. The AlQuraishi Lab focuses on two biological perspectives: the molecular and systems levels. On the molecular side, the lab develops machine learning models for predicting protein structure and function, protein-ligand interactions, and learned representations of proteins and proteomes. On the systems side, the lab applies these models in a proteome-wide fashion to investigate the organization, combinatorial logic, and computational paradigms of signal transduction networks, how these networks vary in human populations, and how they are dysregulated in human diseases, particularly cancer.

Dr. AlQuraishi holds undergraduate degrees in Biology, Computer Science, and Mathematics. He earned an M.S. in Statistics and a Ph.D. in Genetics from Stanford University. He subsequently joined the Systems Biology Department at Harvard Medical School as a Departmental Fellow and a Fellow in Systems Pharmacology, where he developed the first end-to-end differentiable model for learning protein structure from data.


April 21 Machine learning-based design of proteins (and small molecules and beyond)

Speaker: Jennifer Listgarten

Abstract: Data-driven design is making headway into a number of application areas, including protein, small-molecule, and materials engineering. The design goal is to construct an object with desired properties, such as a protein that binds to a target more tightly than previously observed. To that end, costly experimental measurements are being replaced with calls to a high-capacity regression model trained on labeled data, which can be leveraged in an in silico search for promising design candidates. The aim then is to discover designs that are better than the best design in the observed data. This goal puts machine-learning based design in a much more difficult spot than traditional applications of predictive modelling, since successful design requires, by definition, some degree of extrapolation---a pushing of the predictive models to its unknown limits, in parts of the design space that are a priori unknown. In this talk, I will anchor this overall problem in protein engineering, and discuss our emerging computational approaches to tackle it.

Bio: Since Jan. 2018, Jennifer Listgarten is a Professor in the Department of Electrical Engineering and Computer Science, and Center for Computational Biology, at the University of California, Berkeley. She is also a member of the steering committee for the Berkeley AI Research (BAIR) Lab, and a Chan Zuckerberg investigator. From 2007 to 2017 she was at Microsoft Research, through Cambridge, MA (2014-2017), Los Angeles (2008-2014), and Redmond, WA (2007-2008). She completed her Ph.D. in the machine learning group in the Department of Computer Science at the University of Toronto, located in her home town. She has two undergraduate degrees, one in Physics and one in Computer Science, from Queen's University in Kingston, Ontario. Jennifer's research interests are broadly at the intersection of machine learning, applied statistics, molecular biology and science.


April 28 Computer vision to phenotype human diseases across physiological and molecular scales.

Speaker: James Zou

Abstract: I will present new computer vision algorithms to learn complex morphologies and phenotypes that are important for human diseases. I will illustrate this approach with examples that capture physical scales from macro to micro: 1) video-based AI to assess heart function (Ouyang et al Nature 2020), 2) generating spatial transcriptomics from histology images (He et al Nature BME 2020), 3) and learning morphodynamics of immune cells. Throughout the talk I'll illustrate new design principles/tools for human-compatible and robust AI that we developed to enable these technologies (Ghorbani et al. ICML 2020, Abid et al. Nature MI 2020).

Bio: James Zou is an assistant professor at Stanford University and a Chan-Zuckerberg investigator. James develops novel machine learning algorithms to study human health and diseases. He is also interested in making ML more reliable, accountable and human compatible. Several of his methods are used by tech, biotech and pharma companies. He has received several best paper awards at top CS venues, the 2019 RECOMB best paper award, NSF CAREER Award, Google Faculty Award, Tencent AI award, Amazon Research Award and the Sloan Fellowship.


May 5 Causal Inference in Single-cell Genomics

Speaker: Yongjin Park

Abstract: From a naive perspective, single-cell genomics data may look like other ordinary, existing genomics data matrices. Why are biologists so enthusiastic about this new technology? Should a statistician feel the same way? From this question, my group started to investigate a unique aspect of single-cell RNA-seq data and strive to take advantage of a special data structure of single-cell data on our side.In this talk, I will introduce one causal inference question that arises in the context of single-cell differential expression analysis. I will propose a straightforward algorithm to ascertain the effect of disease status on cell-type-specific gene expression profiles. Since the purpose of the algorithm can be better understood in causal inference contexts, I will briefly discuss several causal effect inference strategies, hoping to invigorate interests in the genomics community. Moreover, I will discuss a direct approach to assign cells to a known cell type or state, scalable enough to analyze millions of cells on a modest computing environment with a low memory footprint. If time permits, I will demonstrate that single-cell data can nicely blend with existing tissue-level, bulk data. Integrative analysis with bulk data paves a new way to provide a high-resolution, cell-type-level view of complex disease mechanisms embedded in genome-wide association studies.


Bio: Yongjin Park is a computational biologist interested in solving real-world genomics problems using classical statistics and causal inference approaches. He is currently an Assistant Professor at the University of British Columbia and a scientist at BC Cancer Research. Yongjin received a B.Sc. in Biology and Computer Science and Engineering from Seoul National University, Korea. He began to study statistics and machine learning while working with Russell Schwartz at Carnegie Mellon University and received M.Sc. in Computational Biology. For the doctoral study, he moved to Johns Hopkins University and finished thesis work on probabilistic network analysis under the supervision of Joel Bader. During the postdoctoral training with Manolis Kellis, MIT, his research topic expanded to statistical genetics, causal inference, and complex disorders.


May 12 Quantifying cell type-specific changes in transcriptional state and gene co-regulation across multiple datasets

Speaker: Gerald Quon

Abstract: The generation of single cell datasets under multiple tissues, conditions and species necessitates the development of computational methods to characterize differences in cell type-specific gene regulation across conditions, tissues and species (or more generally, datasets). We have developed scAlign, a tool for performing single cell alignment and data integration to match cells of the same type across datasets. Compared to existing approaches, scAlign is unique in that it can leverage cell type labels for subsets of cells (derived from e.g. only high confidence markers), in addition to being capable of fully unsupervised (no cells are labeled) or fully supervised (all cells are labeled) alignment. We demonstrate diverse applications of scAlign, including finding conserved cell types between the human and mouse cortex, matching hematopoietic progenitor populations across control-stimulus conditions and identifying a rare population of P. falciparum cells that undergo late sexual commitment. I will also demonstrate tools we have developed for performing post-alignment analyses, such as finding differential gene modules and regulation across conditions.


Bio: Gerald Quon is an Assistant Professor in the Department of Molecular and Cellular Biology at UC Davis, and a member of the Genome Center and UC Davis Comprehensive Cancer Center. He obtained his Ph.D. in Computer Science from the University of Toronto, and completed postdoctoral training at MIT under the guidance of Manolis Kellis. His lab focuses on the development of machine learning-based approaches to building quantitative models of cell state and gene regulation. Broad areas of research he is currently pursuing include (1) integrating transcriptomic and cellular phenotypic data to better understand how gene regulation impacts cellular phenotype; (2) finding recurring spatial patterns of gene expression and cellular organization from spatial transcriptomes; and (3) identifying re-wiring of gene regulatory networks across species.


May 19 Machine Learning for Modeling the Dynamics of the Tumor Microenvironment

Speaker: Elham Azizi

Abstract: Cancer therapies succeed only in a subset of patients partly due to the heterogeneity of cells across and within tumors. Recent genomic and imaging technologies that measure features at the resolution of single cells and in the context of the tissue, present exciting opportunities to characterize unknown cell types in the complex tumor microenvironment, and elucidate their circuitry and role in driving response to therapies. However, analyzing and integrating single-cell data across patients, time-points, and data modalities involves significant statistical and computational challenges. I will present a set of machine learning methods developed to address problems such as handling sparsity and noise, distinguishing technical variation from biological heterogeneity, inferring underlying circuitry, and tackling limitations of clinical experimental design. I will also present novel biological insights obtained from applying these methods to multiple cancer systems. These results include continuous phenotypic expansion of immune cells when interfacing with breast tumors, and detecting key T cell subsets with divergent temporal dynamics that define response to immunotherapy in leukemia.

Bio: Elham Azizi is the Herbert & Florence Irving Assistant Professor of Cancer Data Research in the Irving Institute for Cancer Dynamics and Assistant Professor in the Department of Biomedical Engineering at Columbia University. She is also affiliated with the Department of Computer Science, Data Science Institute, and the Herbert Irving Comprehensive Cancer Center.

Elham completed her postdoctoral training at Memorial Sloan Kettering Cancer Center and Columbia University. She received a PhD in Bioinformatics from Boston University, an MS degree in Electrical Engineering from Boston University and a BS in Electrical Engineering from Sharif University of Technology. She is a recipient of the Tri-Institutional Breakout Prize for Junior Investigators, NIH NCI Pathway to Independence Award, and an American Cancer Society Postdoctoral Fellowship.



May 26 Probabilistic models of transcriptomic dysregulation in human genetic disease

Speaker: David Knowles

Abstract: Gene regulation is tightly regulated in healthy human development but frequently dysregulated in disease. RNA-seq has become ubiquitous for assaying the transcriptome: the collection of messenger RNA molecules expressed from the genes of an organism. However, significant computational and statistical challenges remain to translate the resulting noisy, confounded RNA-seq data into meaningful understanding of the biological system or disease state under consideration. I will describe our use of probabilistic models, deep learning and convex optimization to address such challenges.

Bio: David Knowles is a Core Faculty Member at the New York Genome Center and an Assistant Professor in the Departments of Computer Science and Systems Biology at Columbia University. His research focuses on the development of novel machine learning methods and their application to data analysis challenges in genomics with the aim to better understand the role of transcriptomic dysregulation across the spectrum from rare to common genetic disease. The lab works with diverse research groups in collecting large-scale genomics datasets in the context of neurological disease and developing novel genomic technologies including single cell methods, forward genetic screens and long-read transcriptomics. Dr. Knowles obtained a PhD in Engineering (Machine Learning) from the University of Cambridge with Dr. Zoubin Ghahramani. Prior to joining NYGC, he was a postdoctoral fellow at Stanford University, working with Drs. Sylvia Plevritis (Center for Computational Systems Biology/Radiology), Jonathan Pritchard (Genetics), and Daphne Koller (Computer Science).


June 2 TBD

Speaker: Quaid Morris


Title: How to be a machine learning biologist


Bio: Quaid Morris is a full member of the Computation and Systems Biology program at the Sloan Kettering Institute at the Memorial Sloan Kettering Cancer Center. Until last month, he was a full professor at the University of Toronto in the Donnelly Centre with cross-appointments in Molecular Genetics and Computer Science. Quaid is a faculty member at the Vector Institute for Artificial Intelligence (AI) in Toronto, where he holds a Canada CIFAR AI chair. He pursued graduate training and research in machine learning at the Gatsby Unit with Peter Dayan and Geoffrey Hinton at the University College London and obtained his PhD in Computational Neuroscience from Massachusetts Institute of Technology. Quaid's B.Sc. is in computer science, and his PDF is in computational biology with Brendan Frey and Timothy Hughes, both at the University of Toronto.


Morris lab (http://www.morrislab.ca/) uses machine learning and artificial intelligence to do biomedical research, focusing on cancer evolution, post-transcriptional regulation, and gene function prediction. The lab has published more than 100 papers in both high impact journals (Nature, Science, Cell), focused field-specific journals (Nature Methods, Genome Biology, Bioinformatics), and computer science and machine learning conferences (NeurIPS). For the past two years, he has been a Clarivate highly cited researcher.