Sunday September 3, 2023
Explainable and biophysically-grounded models for genotype-to-phenotype mapping
In conjunction with ACM-BCB 2023 in Houston, Texas
Venue: Hyatt Regency Houston/Galleria -- Ballroom D
Workshop Goals: To engage in interdisciplinary discussions and collaborations at the nexus of machine learning, biology, biophysics, and bioengineering. The focus will be on developing explainable black-box (deep learning) models and interpretable models grounded in biophysical principles, offering a more transparent and meaningful understanding of the biological systems under investigation and to enable forward engineering of biological systems.
Topics and questions covered:
1. New Assays and Opportunities: Discussion of recently developed assays for high-resolution mapping of the genotype-to-phenotype relationship, including GPRAs [1], CLASSIC [2], Perturb-Seq [3,4], SpeedingCARs [5], and ECCITE-seq [6], among others.
1. New Assays and Opportunities: Discussion of recently developed assays for high-resolution mapping of the genotype-to-phenotype relationship, including GPRAs [1], CLASSIC [2], Perturb-Seq [3,4], SpeedingCARs [5], and ECCITE-seq [6], among others.
2. Balancing Engineering and Scientific Goals: Exploring the necessary level of understanding for functional outcomes and trade-offs between engineering and scientific goals.
3. Modeling Techniques: Range of black-box and interpretable machine learning modeling approaches applicable to large-assay data, including Biophysically grounded GP maps [1], Convolutional Neural Networks (CNNs) [7], Transformer networks [7], and MAEV-NNs [8].
4. Methods for explaining black-box Models: Overview of techniques available for explainable black-box models, their value for biological understanding, and current gaps [9].
5. Data Quality and Data Utilization: Importance of data quality in large-assays, strategies for dealing with noisy experimental data, addressing challenges of imbalanced datasets, and determining valuable data points for biological understanding.
6. Experiment Design and Application: Utilizing machine learning models to understand cell-circuit design and designing experimental assays for maximum data usability.
7. Integrating Multi-Omics Data: Benefits of using multi-omics data in conjunction with large assays for comprehensive biological understanding.
Invited Speakers
Prof Caleb J. Bashor
Rice University
(Confirmed)
Prof Ankit B. Patel
Baylor College of Medicine
(Confirmed)
Prof Justin Kinney
Cold Spring Harbor Lab
(Confirmed)
Prof Mary Dunlop
Boston University
(Confirmed)
Dr Yue Jiang
ShapeTX
(Confirmed)
Dr Ashley Mae Conard
Microsoft Research
(Confirmed)
Schedule
Time Speaker/Activity
(US Central)
(US Central)
----------------------------------------------------
8:00-8:15 Introduction and Welcome
8:15-8:35 Prof. Caleb Bashor (Rice University)
8:40-9:00 Prof. Ankit Patel (BCM/Rice University)
9:05-9:25 Dr. Yue Jiang (Shape TX)
9:25-9:40 [ Coffee Break ]
9:40-10:00 Dr. Ashley Mae Conard (Microsoft Research)
10:05-10:25 Prof. Mary Dunlop (Boston University)
10:30-10:50 Prof. Justin Kinney (CSHL)
10:50-11:05 [ Coffee Break ]
11:05-11:55 Panel discussion
11:55-12:00 Wrap-up and conclusion
Workshop Organizers
Dr Satpreet Singh
Postdoc, Baylor College of Medicine
Dr Yashwanth Lagisetty
MD/PhD student, UT Health Science Center
Dr Emily Mendez
MD student, UT Health/MD Anderson Cancer Center MSTP Program
Kshitij Rai
PhD Student, Rice University
Ronan O'Connell
PhD Student, Rice University
References:
1. Castellanos-Rueda, R., DiRoberto, R. B., Bieberich, F., Schlatter, F. S., Palianina, D., Nguyen, O. T. P., ... & Hierlemann, A. (2022). speedingCARs: accelerating the engineering of CAR T cells by signaling domain shuffling and single-cell sequencing. Nature Communications, 13(1), 6555.
2. de Boer, C. G., Vaishnav, E. D., Sadeh, R., Abeyta, E. L., Friedman, N., & Regev, A. (2020). Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nature Biotechnology, 38(1), 56-65.
3. Mimitou, E., Cheng, A., Montalbano, A., Hao, S., Stoeckius, M., Legut, M., ... & Ouyang, Z. (n.d.). Expanding the CITE-seq tool-kit: Detection of proteins, transcriptomes, clonotypes and CRISPR perturbations with multiplexing, in a single assay.
4. Novakovsky, G., Dexter, N., Libbrecht, M. W., Wasserman, W. W., & Mostafavi, S. (2023). Obtaining genetics insights from deep learning via explainable artificial intelligence. Nature Reviews Genetics, 24(2), 125-137.
5. O’Connell, R. W., Rai, K., Piepergerdes, T. C., Samra, K. D., Wilson, J. A., Lin, S., ... & Chen, D. S. (2023). Ultra-high throughput mapping of genetic design space. bioRxiv, 2023-03.
6. Tareen, A., Kooshkbaghi, M., Posfai, A., Ireland, W. T., McCandlish, D. M., & Kinney, J. B. (2022). MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biology, 23(1), 98.
7. Ursu, O., Neal, J. T., Shea, E., Thakore, P. I., Jerby-Arnon, L., Nguyen, L., ... & Izar, B. (2022). Massively parallel phenotyping of coding variants in cancer with Perturb-seq. Nature Biotechnology, 40(6), 896-905.
8. Vaishnav, E. D., de Boer, C. G., Molinet, J., Yassour, M., Fan, L., Adiconis, X., ... & Regev, A. (2022). The evolution, evolvability and engineering of gene regulatory DNA. Nature, 603(7901), 455-463.
9. Yampolskaya, M., Herriges, M., Ikonomou, L., Kotton, D., & Mehta, P. (2023). scTOP: physics-inspired order parameters for cellular identification and visualization. bioRxiv, 2023-01.
10. Yao, D., Binan, L., Bezney, J., Simonton, B., Freedman, J., Frangieh, C. J., ... & Gusev, A. (2023). Compressed Perturb-seq: highly efficient screens for regulatory circuits using random composite perturbations. bioRxiv, 202