Sunday September 3, 2023

Explainable and biophysically-grounded models for genotype-to-phenotype mapping

In conjunction with ACM-BCB 2023 in Houston, Texas

Venue: Hyatt Regency Houston/Galleria -- Ballroom D 


Workshop Goals: To engage in interdisciplinary discussions and collaborations at the nexus of machine learning, biology, biophysics, and bioengineering. The focus will be on developing explainable black-box (deep learning) models and interpretable models grounded in biophysical principles, offering a more transparent and meaningful understanding of the biological systems under investigation and to enable forward engineering of biological systems.

Topics and questions covered:
1. New Assays and Opportunities: Discussion of recently developed assays for high-resolution mapping of the genotype-to-phenotype relationship, including GPRAs [1], CLASSIC [2], Perturb-Seq [3,4], SpeedingCARs [5], and ECCITE-seq [6], among others.

2. Balancing Engineering and Scientific Goals: Exploring the necessary level of understanding for functional outcomes and trade-offs between engineering and scientific goals.

3. Modeling Techniques: Range of black-box and interpretable machine learning modeling approaches applicable to large-assay data, including Biophysically grounded GP maps [1], Convolutional Neural Networks (CNNs) [7], Transformer networks [7], and MAEV-NNs [8].

4. Methods for explaining black-box Models: Overview of techniques available for explainable black-box models, their value for biological understanding, and current gaps [9].

5. Data Quality and Data Utilization: Importance of data quality in large-assays, strategies for dealing with noisy experimental data, addressing challenges of imbalanced datasets, and determining valuable data points for biological understanding.

6. Experiment Design and Application: Utilizing machine learning models to understand cell-circuit design and designing experimental assays for maximum data usability.

7. Integrating Multi-Omics Data: Benefits of using multi-omics data in conjunction with large assays for comprehensive biological understanding.


Invited Speakers

Prof Caleb J. Bashor

Rice University

Prof Ankit B. Patel

Baylor College of Medicine

Prof Justin Kinney

Cold Spring Harbor Lab

Prof Mary Dunlop

Boston University

Dr Yue Jiang


Dr Ashley Mae Conard

Microsoft Research




Time Speaker/Activity                
(US Central)         


8:00-8:15    Introduction and Welcome

8:15-8:35    Prof. Caleb Bashor (Rice University)

8:40-9:00    Prof. Ankit Patel (BCM/Rice University)            

9:05-9:25    Dr. Yue Jiang (Shape TX)             

9:25-9:40    [ Coffee Break ]

9:40-10:00   Dr. Ashley Mae Conard (Microsoft Research)              

10:05-10:25  Prof. Mary Dunlop (Boston University)              

10:30-10:50  Prof. Justin Kinney (CSHL)             

10:50-11:05  [ Coffee Break ]

11:05-11:55  Panel discussion         

11:55-12:00  Wrap-up and conclusion   

Workshop Organizers

Dr Satpreet Singh

Postdoc, Baylor College of Medicine

Dr Yashwanth Lagisetty

MD/PhD student, UT Health Science Center

Dr Emily Mendez

MD student, UT Health/MD Anderson Cancer Center MSTP Program

      Kshitij       Rai

 PhD Student, Rice University

Ronan O'Connell

PhD Student, Rice University


1. Castellanos-Rueda, R., DiRoberto, R. B., Bieberich, F., Schlatter, F. S., Palianina, D., Nguyen, O. T. P., ... & Hierlemann, A. (2022). speedingCARs: accelerating the engineering of CAR T cells by signaling domain shuffling and single-cell sequencing. Nature Communications, 13(1), 6555.

2. de Boer, C. G., Vaishnav, E. D., Sadeh, R., Abeyta, E. L., Friedman, N., & Regev, A. (2020). Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nature Biotechnology, 38(1), 56-65.

3. Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Ouyang, Z. (n.d.). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay.

4. Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W., & Mostafavi, S. (2023). Obtaining genetics insights from deep learning via explainable artificial intelligence. Nature Reviews Genetics, 24(2), 125-137.

5. O’Connell, R. W., Rai, K., Piepergerdes, T. C., Samra, K. D., Wilson, J. A., Lin, S., ... & Chen, D. S. (2023). Ultra-high throughput mapping of genetic design space. bioRxiv, 2023-03.

6. Tareen, A., Kooshkbaghi, M., Posfai, A., Ireland, W. T., McCandlish, D. M., & Kinney, J. B. (2022). MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biology, 23(1), 98.

7. Ursu, O., Neal, J. T., Shea, E., Thakore, P. I., Jerby-Arnon, L., Nguyen, L., ... & Izar, B. (2022). Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nature Biotechnology, 40(6), 896-905.

8. Vaishnav, E. D., de Boer, C. G., Molinet, J., Yassour, M., Fan, L., Adiconis, X., ... & Regev, A. (2022). The evolution, evolvability and engineering of gene regulatory DNA. Nature, 603(7901), 455-463.

9. Yampolskaya, M., Herriges, M., Ikonomou, L., Kotton, D., & Mehta, P. (2023). scTOP: physics-inspired order parameters for cellular identification and visualization. bioRxiv, 2023-01.

10. Yao, D., Binan, L., Bezney, J., Simonton, B., Freedman, J., Frangieh, C. J., ... & Gusev, A. (2023). Compressed Perturb-seq: highly efficient screens for regulatory circuits using random composite perturbations. bioRxiv, 202