Machine Learning for Biological Data

1. Summary

Over the past few decades, biology has undergone a revolution of data as powerful high-throughput experiments continue to decrease in cost and increase in scope. This "deluge of data"', however, is only as useful as the quantitative analysis methods available to make sense of the information. Scientists have increasingly looked towards the field of artificial intelligence, in particular, the subarea of machine learning, for solutions to these data analysis deficiencies.

In this module, we will introduce the basic frameworks of machine learning (supervised and unsupervised learning) and apply them to biological problems. We will first look at historical applications -- the use of gene expression data to uncover gene function and to provide predictive markers for disease. We will then survey recent trends in computational biology including sequence analysis and structure prediction.

2. Presentation Materials

Click here.

3. Hands-on Exercise(s)

Click here.

4. Associated Materials/Files

Included with Git Repository that has the hands-on instructions.

5. Program/Software requirements

Python3

Scikit learn http://scikit-learn.org/stable/index.html

numpy https://www.scipy.org/scipylib/download.html

6. Advanced Material

Pedro Domingos. A Few Useful Things to Know about Machine Learning. Communications of the ACM 2012. https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

Michael B. Eisen, Paul T. Spellman, Patrick O. Brown, and David Botstein. "Cluster analysis and display of genome-wide expression patterns". Proceedings of the National Academy of Sciences, 1998, 95(25) 14863-14868 http://www.pnas.org/content/95/25/14863.full

Michael K. K. Leung, Andrew Delong, Babak Alipanahi, and Brendan J. Frey. Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets. Proceedings of the IEEE. 2016 https://ieeexplore.ieee.org/document/7347331

Model Tuning and the Bias-Variance Tradeoff http://www.r2d3.us/visual-intro-to-machine-learning-part-2/

7. Instructor Notes

See the README in the hands-on exercise.