Introduction to this course
Example Applications of Automation in Biology
Course Requirements
Olsen, Kevin (2012-12-01). "The First 110 Years of Laboratory Automation Technologies, Applications, and the Creative Scientist". Journal of Laboratory Automation. 17 (6): 469–480
Efficient discovery of responses of proteins to compounds using active learning. Kangas, Naik, Murphy. BMC Bioinformatics 15:143
Online Machine Learning
Littlestone, N.; Warmuth, M. (1994). "The Weighted Majority Algorithm". Information and Computation 108: 212–261. doi:10.1006/inco.1994.1009
Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119--139,
S. Shalev-Shwartz Online Learning and Convex Optimization DOI: 10.1561/2200000018
Core concepts and terminology, Heuristic query selection strategies, part 1
Sections 1-3 of "Active Learning Literature Survey", B. Settles, UW Madison CS Tech Report 1648
Heuristic query selection strategies, part 2
Sections 1-3 of "Active Learning Literature Survey", B. Settles, UW Madison CS Tech Report 1648
The two faces of active learning
Hypothesis space search methods, part 1
The CAL algorithm
S. Dasgupta, Two Faces of Active Learning doi:10.1016/j.tcs.2010.12.054
CAL: Cohn, Atlas, Ladner; "Improving Generalization with Active Learning" Machine Learning May 1994, Volume 15, Issue 2, pp 201-221
The A2 algorithm
The DHM algorithm
"DHM": Dasgupta, Hsu, Monteleoni "A general agnostic active learning algorithm" NIPS '08
Alternatively, read the description of the DHM algorithm in the "Two Faces of Active Learning" paper.
"A2": Balcan, Beygelzimer, Langford "Agnostic Active Learning" J Computer and System Sciences Volume 75, Issue 1, January 2009, Pages 78–89
Hypothesis space search methods, part 3
The IWAL algorithm
Cluster exploitation methods, part 1
The ZLG algorithm
"IWAL": Beygelzimer, Dasgupta, Langford, "Importance Weighted Active Learning" ICML 09
"ZLG": Zhu, Lafferty, Ghahramani, "Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions" ICML 03
A longer version of the IWAL paper.
Cluster exploitation methods, part 2
Background & the DH algorithm
The PLAL algorithm
"DH": Dasgupta and Hsu; "Hierarchical Sampling for Active Learning" ICML 08
"PLAL": Urner, Wulff, Ben-David "PLAL: Cluster-based Active Learning" COLT 13
Computational Learning Theory
PAC Learning
VC dimension
Disagreement Coefficient
Chapter 2 of Theoretical Foundations of Active Learning, Steve Hanneke, Ph.D. Dissertation, Machine Learning Department, Carnegie Mellon University CMU-ML-09-106
Castro and Nowak "Minimax Bounds for Active Learning" Information Theory, IEEE Transactions on (Volume:54 , Issue: 5 ) P. 2339 - 2353
V. Vapnik. Statistical Learning Theory. Wiley, 1998.
Balcan, Hanneke, Vaughn “The True Sample Complexity of Active Learning”, Machine Learning, 2010, Volume 80, Issue 2, pp 111-139
Active Learning for Non-parametric Regression
Optimal Design of Experiments (DOE)
Read sections 1 & 2 of Optimal Design
"RDP": Faster Rates in Regression via Active Learning, Castro, Willett, Nowak UW Madison Technical Report ECE-05-3
A longer version of the RDP paper: "RDP": Faster Rates in Regression via Active Learning, Castro, Willett, Nowak UW Madison Technical Report ECE-05-3
Smith, Kirstine (1918). "On the Standard Deviations of Adjusted and Interpolated Values of an Observed Polynomial Function and its Constants and the Guidance They Give Towards a Proper Choice of the Distribution of the Observations". Biometrika 12 (1): 1–85. doi:10.2307/2331929
Design and Analysis of Experiments Douglas Montgomery, Wiley, 8th Edition
Convex Optimization Boyd and Vandenberghe
Robust Design of Biological Experiments Flaherty, Jordan, Arkin, NIPS, 2006
Active Learning for regression using DOE techniques
"ALICE" Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error Sugiyama; JMLR 7(Jan):141--166, 2006
"LapRDD" Laplacian Regularized D-optimal Design for active learning and its application to image retrieval. He X, IEEE Trans Image Process. 2010 Jan;19(1):254-63. doi: 10.1109/TIP.2009.2032342
M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: A geometric framework for learning from labeled and unlabeled examples,” J. Mach. Learn.Res., vol. 7, pp. 2399–2434, 2006
Pool-based active learning in approximate linear regression Sugiyama & Nakajima, ML, vol.75, no.3, pp.249-274, 2009
Proactive Learning
Active Feature Value Acquisition
Donmez, P., Carbonell, J.G.: Proactive Learning: Cost-Sensitive Active Learning with Multiple Imperfect Oracles, in Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM '08), 2008.
Saar-Tsechansky, et al "Active Feature-Value Acquisition" Mgmt Sci, 2009
Zhang and Chaudhuri, "Active Learning from Weak and Strong Labelers", NIPS 2015
A longer version of the Zhang and Chaudhuri, paper that includes appendices.
Learning from Weak Teachers Urner, Ben-David, and Shamir, AISTATS, 2012, pages 1252-1260
Scientific Discovery vs Engineering Design
Sequential Model-Based Optimization & Multi-Armed Bandits
Bergstra et al "Algorithms for hyper-parameter optimization", NIPS, 2011
A tutorial on Bayesian Optimization, P. Frazier
Practical Bayesian Optimization of Machine Learning Algorithms, Snoek et al, NIPS 2012, pp. 2951–2959
Peter Auer, Nicolò Cesa-Bianchi, Paul Fischer "Finite-time Analysis of the Multiarmed Bandit Problem" Machine Learning May 2002, Volume 47, Issue 2, pp 235-256
Reid, et al "Decision-making without a brain: how an amoeboid organism solves the two-armed bandit", 2016.DOI: 10.1098/rsif.2016.0030
Associated videos
Cloud Laboratories
Active Learning for Drug Screening
Warmuth et al "Active Learning with Support Vector Machines in the Drug Discovery Process", J Chem Inf Comput Sci. 2003 Mar-Apr;43(2):667-73
Phenotyping
Protein localization
Smith and Horvath, "Active Learning Strategies for Phenotypic Profiling of High-Content Screens", J Biomol Screen June 2014 vol. 19 no. 5 685-695
Jarvik, et al "CD-tagging: a new approach to gene and protein discovery and analysis", Biotechniques. 1996 May;20(5):896-904
Sigal, et al "Generation of a fluorescently labeled endogenous protein library in living human cells" Nature Protocols 2, - 1515 - 1527 (2007)
Naik et al "Active machine learning-driven experimentation to determine compound effects on protein patterns",2016, eLife DOI: 10.7554/eLife.10047.001
Bayesian Estimation & Ordinary Differential Equation Models
Bayesian Active Learning
A Bayesian active learning strategy for sequential experimental design in systems biology, Pauwels, et al. BMC Syst Biol, 2014; 8(1): 102
Kinetics of Influenza A Virus Infection in Humans, Baccam, et al JOURNAL OF VIROLOGY, 2006, p. 7590–7599
Logical Inference
Logic Based Active Learning
King et al "Functional genomic hypothesis generation and experimentation by a robot scientist" , Nature 427, 247-252, 2004. If you are off campus, you can also get a copy of the paper here.
also see supplemental information
King et al "The Automation of Science", Science 2009: Vol. 324 no. 5923 pp. 85-89
also see supplemental information
Bryant et al Combining Inductive Logic Programming Active Learning and Robotics to Discover the Function of Genes, Electronic Transactions on Artificial Intelligence, Vol. 5 (2001), Section B, pp. 1-36
Sparkes et al "Towards Robot Scientists for autonomous scientific discovery", Autom Exp. 2010; 2: 1
Bayesian Networks
Today's paper
Cho et al "Reconstructing Causal Biological Networks through Active Learning", PLoS One, 2016 11(3): e0150611. doi:10.1371/journal.pone.0150611
Pournara and Wernisch "Reconstruction of gene networks using Bayesian learning and manipulation experiments" Bioinformatics. 2004 Nov 22;20(17):2934-42
Active Learning of Causal Bayes Net Structure KP Murphy, 2001
Ness et al "A Bayesian Active Learning Experimental Design for Inferring Signaling Networks" RECOMB 2017: Research in Computational Molecular Biology pp 134-156
Active Learning of Boolean Networks
Atias et al Experimental design schemes for learning Boolean network models, Bioinformatics. 2014 Sep 1; 30(17): i445–i452.
Active Learning for Protein Design
Danziger et al "Predicting Positive p53 Cancer Rescue Regions Using Most Informative Positive (MIP) Active Learning", PLoS Comp Bio 2009, DOI: 10.1371/journal.pcbi.1000498
Machine-Learning Guided Directed Evolution
Machine learning-assisted directed protein evolution with combinatorial libraries, Wu et al PNAS 2019, https://doi.org/10.1073/pnas.1901979116
Fold Family-Regularized Bayesian Optimization for Directed Protein Evolution. Frisby and Langmead, 20th International Workshop on Algorithms in Bioinformatics (WABI) 2020, pages 1-18.
Reinforcement learning for biological sequence design
MODEL-BASED REINFORCEMENT LEARNING FOR BIOLOGICAL SEQUENCE DESIGN, Angermueller et al ICLR 2020
Deep Reinforcement Learning for Optimizing Chemical Reactions
Optimizing Chemical Reactions with Deep Reinforcement Learning, Zhou et al, ACS Cent. Sci. 2017, 3, 12, 1337-1344 https://doi.org/10.1021/acscentsci.7b00492
Tuning the molecular weight distribution from atom transfer radical polymerization using deep reinforcement learning, Li et al, MSDE, 2018 DOI: 10.1039/C7ME00131B
Controlling an organic synthesis robot with machine learning to search for new reactivity, Granda et al, Nature, 559, 377–381 (2018) doi:10.1038/s41586-018-0307-8
Symbolic Regression for learning natural laws
Distilling Free-Form Natural Laws from Experimental Data, Schmidt and Lipson, Science, 2009 Vol. 324, Issue 5923, pp. 81-85 DOI: 10.1126/science.1165893
Abstract Boolean Networks and Formal Reasoning
A method to identify and analyze biological programs through automated reasoning, Yordanov, et al npj Syst Biol Appl 2, 16010 (2016) doi:10.1038/npjsba.2016.10
Discovering PDEs
Course Summary
Data-driven discovery of partial differential equations, Rudy, et al Science Advances 2017 DOI: 10.1126/sciadv.1602614