WINTER 2022
MATH - BIOINF - STATS 547: Mathematics of Data
Instructor: Prof. INDIKA Rajapakse (indikar@umich.edu)
Teaching Assistant: COOPER Stansbury (cstansbu@umich.edu)
Class Time: Tuesday and Thursday, 2:30 - 4:00PM in EH B844: https://meet.google.com/dnm-zipr-tsb
Recorded Lectures: Available on Canvas only
Office Hours:
Wednesday and Friday (Indika), 4:00 - 5:00PM: https://meet.google.com/dnm-zipr-tsb or in person after class
Monday (Cooper) from 2:00 - 3:00PM: meet.google.com/ivh-fege-xby
Timeline: Link to Timeline
RESOURCES
Some ideas on Simplicity, Rigor, and Magic!
Digital Library: An unofficial digital library I maintain that contains books and papers on topics related to my research and teaching
Most Common Optimization Strategies
Great book with codes: Cleve Moler . Numerical computing with MATLAB. Society for Industrial and Applied Mathematics, (2004)
Great Blogs! I visit these from time to time, and they have many great articles
Final Class Day Resources:
PROBLEM SETS
This section includes assignments, solutions, and helpful resources.
Problem Set 1: Due 01/25/2022 on Canvas
Problem Set 2: Due 02/14/2022 on Canvas (Due date extended from 02/10)
A Tutorial on Spectral Clustering (Excellent Review!)
Helpful Examples:
Paper: Surana A, Chen C, Rajapakse I. "Hypergraph Similarity Measures." arXiv preprint arXiv:2106.08206 (2021)
Final Problem Set Ideas: Due 02/15/2022
Problem Set 3: Due 02/25/2022
Problem Set 4: Due 03/24/2022
Final Project: Due 04/28/2022
Final Project Reviews: Due 04/14/2022
Problem of the Day
LECTURE JAMBOARDS
Some recordings are incomplete, but the class notes contain all content presented, so please use them together.
NOTES BY SUBJECT
This section contains class notes as well as helpful resources, including classic papers, grouped by topic. Please browse this section, and keep it in mind in your future work beyond this course.
Singular Value Decomposition (SVD)
Guest Lecture: Dr. Cleve Moler
Direct link to YouTube: https://www.youtube.com/watch?v=R9UoFyqJca8
Additional Reading
Moler C. Professor SVD. The MathWorks News & Notes. 2006.
Lieberman-Aiden, Erez, ..., Groudine Mark, ..., Lander Eric. "Comprehensive mapping of long-range interactions reveals folding principles of the human genome." Science 326.5950 (2009): 289-293.
Martin CD, Porter MA. The extraordinary SVD. The American Mathematical Monthly. 2012 Dec 1;119(10):838-51. (Excellent Review!)
Eckhart-Young Theorem and Low Rank Approximations
Proof (Book: Golub and Van Loan, Matrix computations )
Additional Reading
Gavish, Matan, and David L. Donoho. "The optimal hard threshold for singular values is 4/sqrt(3)." IEEE Transactions on Information Theory 60.8 (2014): 5040-5053. (Amazing Paper!)
Udell, Madeleine, and Alex Townsend. "Why are big data matrices approximately low rank?." SIAM Journal on Mathematics of Data Science 1.1 (2019): 144-160.
Turk, Matthew, and Alex Pentland. "Eigenfaces for recognition." Journal of cognitive neuroscience 3.1 (1991): 71-86. (Classic! just browse)
Dimension Reduction
Linear Methods
Nonlinear Methods
PHATE: Nice visualization method and preserves global and local nonlinear structure better.
Papers
Kobak, Dmitry, and George C. Linderman. "Initialization is critical for preserving global data structure in both t-SNE and UMAP." Nature Biotechnology (2021): 1-2.
McInnes, Leland, John Healy, and James Melville. "Umap: Uniform manifold approximation and projection for dimension reduction." arXiv preprint arXiv:1802.03426 (2018).
Wang Y, Huang H, Rudin C, Shaposhnik Y. Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization. arXiv preprint arXiv:2012.04456. 2020 Dec 8.
Remark: This is a really nice review in general, but the most relevant parts are pages 9 and 10
Van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing data using t-SNE." Journal of machine learning research 9.11 (2008).
The Laplacian
Papers
Ng, Andrew Y., Michael I. Jordan, and Yair Weiss. "On spectral clustering: Analysis and an algorithm." Advances in neural information processing systems 2 (2002): 849-856.
Belkin, Mikhail, and Partha Niyogi. "Laplacian eigenmaps and spectral techniques for embedding and clustering." Advances in neural information processing systems. 2002.
Von Luxburg, Ulrike. "A tutorial on spectral clustering." Statistics and computing 17.4 (2007): 395-416. (Excellent Review!)
Cheeger J. A lower bound for the smallest eigenvalue of the Laplacian. In Problems in analysis 2015 Mar 8 (pp. 195-200). Princeton University Press.
Chen, Pin-Yu, et al. "Fast incremental von neumann graph entropy computation: Theory, algorithm, and applications." International Conference on Machine Learning. PMLR, 2019.
The Turing System
Slides: An Example from Spatial Transcriptomics
Papers
Turing, Alan. "The chemical basis of morphogenesis." Philosophical Transactions of the Royal Society of London B. 237 (1952): 641
Rajapakse I, and Smale S. "Emergence of Function from Coordinated Cells in a Tissue." Proceedings of the National Academy of Sciences 114.7 (2017): 1462-1467.
Dynamic Mode Decomposition (DMD)
Book:
Kutz, J. Nathan, et al. Dynamic mode decomposition: data-driven modeling of complex systems. Society for Industrial and Applied Mathematics, 2016.
Chapter 1: Dynamic Mode Decomposition: An Introduction
Papers
Hirsh, Seth M., et al. "Centering data improves the dynamic mode decomposition." SIAM Journal on Applied Dynamical Systems 19.3 (2020): 1920-1955.
T. Askham, P. Zheng, A. Aravkin, and J. N. Kutz, “Robust and Scalable Methods for the Dynamic Mode Decomposition,” SIAM J. Appl. Dyn. Syst., vol. 21, no. 1, pp. 60–79, Mar. 2022
P. J. Baddoo, B. Herrmann, B. J. McKeon, J. N. Kutz, and S. L. Brunton, “Physics-informed dynamic mode decomposition (piDMD),” Dec. 2021
Z. Bai, E. Kaiser, J. L. Proctor, J. N. Kutz, and S. L. Brunton, “Dynamic Mode Decomposition for Compressive System Identification,” AIAA Journal, vol. 58, no. 2, pp. 561–574, 2020
J. J. Bramburger, D. Dylewsky, and J. N. Kutz, “Sparse identification of slow timescale dynamics,” Phys. Rev. E, vol. 102, no. 2, p. 022204, Aug. 2020
S. L. Brunton and J. N. Kutz, “Methods for data-driven multiscale model discovery for materials,” J. Phys. Mater., vol. 2, no. 4, p. 044002, Jul. 2019
C. W. Curtis and D. J. Alford-Lago, “Dynamic-mode decomposition and optimal prediction,” Phys. Rev. E, vol. 103, no. 1, p. 012201, Jan. 2021
U. Fasel, E. Kaiser, J. N. Kutz, B. W. Brunton, and S. L. Brunton, “SINDy with Control: A Tutorial,” in 2021 60th IEEE Conference on Decision and Control (CDC), Dec. 2021
M. R. Jovanović, P. J. Schmid, and J. W. Nichols, “Sparsity-promoting dynamic mode decomposition,” Physics of Fluids, vol. 26, no. 2, p. 024103, Feb. 2014
N. M. M. Kalimullah, A. Shelke, and A. Habib, “Multiresolution Dynamic Mode Decomposition (mrDMD) of Elastic Waves for Damage Localisation in Piezoelectric Ceramic,” IEEE Access, vol. 9, pp. 120512–120524, 2021
S. Klus, P. Gelß, S. Peitz, and C. Schütte, “Tensor-based dynamic mode decomposition,” Nonlinearity, vol. 31, no. 7, pp. 3359–3380, Jun. 2018
J. N. Kutz, X. Fu, S. L. Brunton, and N. B. Erichson, “Multi-resolution Dynamic Mode Decomposition for Foreground/Background Separation and Object Tracking,” in 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Dec. 2015
J. N. Kutz, X. Fu, and S. L. Brunton, “Multiresolution Dynamic Mode Decomposition,” SIAM J. Appl. Dyn. Syst., vol. 15, no. 2, pp. 713–735, Jan. 2016
S. Le Clainche and J. M. Vega, “Higher Order Dynamic Mode Decomposition,” SIAM J. Appl. Dyn. Syst., vol. 16, no. 2, pp. 882–925, Jan. 2017
I. Mezić, “Spectral Properties of Dynamical Systems, Model Reduction and Decompositions,” Nonlinear Dyn, vol. 41, no. 1, pp. 309–325, Aug. 2005
P. J. Schmid, “Dynamic mode decomposition of numerical and experimental data,” Journal of Fluid Mechanics, vol. 656, pp. 5–28, Aug. 2010
J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz, “On Dynamic Mode Decomposition: Theory and Applications,” Journal of Computational Dynamics, vol. 1, no. 2, pp. 391–421, 2014
Koopman Theory
Papers
Mezic I. Koopman operator, geometry, and learning. arXiv preprint arXiv:2010.05377. 2020 Oct 12.
Champion K, Lusch B, Kutz JN, Brunton SL. "Data-driven discovery of coordinates and governing equations." Proceedings of the National Academy of Sciences. 2019 Nov 5;116(45):22445-51.
Surana A. Koopman operator framework for time series modeling and analysis. Journal of Nonlinear Science. 2020 Oct;30(5):1973-2006.
Video: Koopman Operator Theory for Dynamical Systems, Control and Data Analytics by Prof. Igor Mezic
Data Guided Control (DGC)
Papers
Ronquist S, Patterson G, Muir LA, Lindsly S, Chen H, Brown M, Wicha M, Bloch A, Brockett R and Rajapakse I. "Algorithm for Cellular Reprogramming." Proceedings of the National Academy of Sciences 114.45 (2017): 11832-11837.
Moore B. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE transactions on automatic control. 1981 Feb;26(1):17-32. (Amazing Paper!)
PagRank (PR)
Papers
Bryan, Kurt, and Tanya Leise. "The $25,000,000,000 eigenvector: The linear algebra behind Google." SIAM review 48.3 (2006): 569-581.
Causal Discovery from Data: Dr. David Heckerman
Matrix Completion and CVX
Convex Optimization Tools: CVX
Papers
Candès, Emmanuel J., et al. "Robust principal component analysis?." Journal of the ACM (JACM) 58.3 (2011): 1-37.
Scherl I, Strom B, Shang JK, Williams O, Polagye BL, Brunton SL. Robust principal component analysis for modal decomposition of corrupt fluid flows. Physical Review Fluids. 2020 May 28;5(5):054401.
Becker SR, Candès EJ, Grant MC. Templates for convex cone problems with applications to sparse signal recovery. Mathematical programming computation. 2011 Sep;3(3):165-218.
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009 Aug 7;42(8):30-7.
Van Dijk D, Sharma R, Nainys J, Yim K, Kathail P, Carr AJ, Burdziak C, Moon KR, Chaffer CL, Pattabiraman D, Bierie B. Recovering gene interactions from single-cell data using data diffusion. Cell. 2018 Jul 26;174(3):716-29.
Compressive Sensing
"Magic" Reconstruction: Compressed Sensing (MATLAB Code)
The following two chapters are good references on Compressive Sensing [2, 3].
Sparsity and Compressed Sensing
Terence Tao's summary of the current state of compressive sampling theory
Papers
Candès EJ, Wakin MB. An introduction to compressive sampling. IEEE signal processing magazine. 2008 Mar 21;25(2):21-30. (Excellent Review!)
Brunton SL, Kutz JN. Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press; 2019 Feb 28.
Kutz JN. Data-driven modeling & scientific computation: methods for complex systems & big data. Oxford University Press; 2013 Aug 8.
Tensors and Hypergraphs
Additional Slides: AFOSR 2021 slides
Software
Hypergraph Visualization: PAOHvis
Books
Eldén, Lars. Matrix methods in data mining and pattern recognition. Society for Industrial and Applied Mathematics, 2007.
Chapter 8: Tensor Decomposition
Papers
Kolda, Tamara G., and Brett W. Bader. "Tensor decompositions and applications." SIAM review 51.3 (2009): 455-500. (Excellent Review!)
Chen C, Rajapakse I. "Tensor Entropy for Uniform Hypergraphs." IEEE Transactions on Network Science and Engineering 7.4 (2020): 2889-2900.
Chen C, Surana A, Bloch A, Rajapakse I. "Controllability of Hypergraphs." IEEE Transactions on Network Science and Engineering, Accepted , 2021 Mar 23(2021).
Benson, Austin R., David F. Gleich, and Desmond J. Higham. "Higher-order Network Analysis Takes Off, Fueled by Classical Ideas and New Data." arXiv preprint arXiv:2103.05031 (2021). (Nice Read!)
Williams, Alex H., et al. "Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis." Neuron 98.6 (2018): 1099-1115.
Wolf, Michael M., Alicia M. Klinvex, and Daniel M. Dunlavy. "Advantages to modeling relational data using hypergraphs versus graphs." 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2016.
Zhou Y, Rathore A, Purvine E, Wang B. Topological Simplifications of Hypergraphs. arXiv preprint arXiv:2104.11214. 2021 Apr 22.
Nonnegative Matrix Factorization
Notes
Papers
Lee D, Seung HS. Algorithms for non-negative matrix factorization. Advances in neural information processing systems. 2000;13.
Lee, Daniel D., and H. Sebastian Seung. "Learning the parts of objects by non-negative matrix factorization." Nature 401.6755 (1999): 788-791.
Book
Cichocki A, Zdunek R, Phan AH, Amari SI. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons; 2009 Jul 10.
MATLAB Image Decomposition Examples
TED Talk: I am my connectome
Topological Data Analysis
A Taste of Topology: Book Chapter: Charles Pugh. Real mathematical analysis. (Excellent Exposition!)
Introduction: These slides deck includes slides from my good friend Yuan Yao
Software
WEB Resources
Papers
Wasserman L. Topological data analysis. Annual Review of Statistics and Its Application. 2018 Mar 7;5:501-32.
Tierny, J. (2017). Introduction to topological data analysis
Nicolau M, Levine AJ, Carlsson G. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival Proceedings of the National Academy of Sciences. 2011 Apr 26;108(17):7265-70.
Carlsson G. Topology and Data. Bulletin of the American Mathematical Society. 2009;46(2):255-308.
Lum PY, Singh G, Lehman A, Ishkanov T, Vejdemo-Johansson M, Alagappan M, Carlsson J, Carlsson G. Extracting insights from the shape of complex data using topology. Scientific reports. 2013 Feb 7;3(1):1-8.
DATA Sets
Austin Benson Datasets (Hypergraph Data)
Metabolite (Hypergraph Data)
COVID data: Slides from Dr. Dennis Chao from Gates Foundation
GENERAL READING
I will add to this list throughout the semester
Donoho D. "50 years of data science." Journal of Computational and Graphical Statistics. 2017 Oct 2;26(4):745-66.
Rajapakse, Indika. "Conversation with Dr. Steve Smale and Dr. Lee Hartwell." NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY 68, no. 9.
Aksoy SG, Hagberg A, Joslyn CA, Kay B, Purvine E, Young SJ. Models and Methods for Sparse (Hyper) Network Science in Business, Industry, and Government. NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY.;69(2).
Smale, Steve. "Mathematical problems for the next century." The mathematical intelligencer 20.2 (1998): 7-15.
Kolda T. Mathematics: The Tao of Data Science. (2020).