Introduces students to the fundamental mathematical principles of data science that underlie the algorithms, processes, methods, and data-centric thinking. Introduces students to algorithms and tools based on these principles.
Recommended background: CMSE 802 or equivalent experience. Differential equations at the level of MTH 235/255H/340+442/347H+442. Linear algebra at the level of MTH 390/317H. Probability and statistics at the level of STT 231.
Offered every spring semester.
Model reduction and scientific machine-learning (CMSE 890-002, Fall 2024)
The course focuses on scientific ML methods designed to construct reduced models of multiscale systems with an emphasis on the direct connections to computational mathematics and natural science applications. The potential topics include
Model reduction theories such as the bottom-up Mori-Zwanzig and the top-down GENERIC formalism, and the design of proper physics-compatible parametric forms preserving physics constraints. Hands-on materials include continuum laws of balance equations with a connection to physics-informed learning, sparse identification of nonlinear bases, and symmetry-preserving neural networks (connected with mechanical engineering).
The variational inference approaches with applications to probabilistic modeling and uncertainty quantification in the presence of high-dimensional randomness. Hands-on materials include Hamiltonian dynamics and the Liouville equation with a connection to the popular Bayes posterior sampling methods such as MCMC and the Stein’s variational gradient dynamics (connected with math and physics).
Generative models such as the energy-based (EBM) and flow-based models and the graph-embedding NNs with applications to the reduced modeling of quasi-equilibrium multiphysics systems. Hands-on materials include their connection to Langevin dynamics (LD) and examples of training EBM using LD for a 2D lattice with complex interactions (connected with chemistry and biology).
Dynamic models such as the autoregressive models, non-parametric kernel embedding, and recurrent NNs with applications for learning the reduced dynamics of multiscale systems. Hands-on materials include examples of complex systems such as Lorenz models, the Kuramoto–Sivashinsky equation, and molecular dynamics (connected with chemical engineering).
(3 credits) Lead Instructor: Prof. Huan Lei; Offered from Fall 2024
Computational Inverse Problems: From Regularization to Machine Learning (CMSE 890-003, Spring 2025)
In this course we will discuss the fundamentals of inverse problems encountered in science and engineering. We will explore traditional approaches for solving these problems, including linear regression, Tikhonov regularization, Lasso, iterative methods, Fourier techniques, and the Bayesian Method. Additionally, we will also learn contemporary Machine Learning (ML) techniques, such as neural networks and generative priors, used in various reconstruction algorithms. Emphasis will be placed on understanding the theory and mathematics behind Standard and ML methods for inverse problems, as well as on practical implementation details. Our primary focus will be on imaging applications, specifically in the fields of natural image processing and medical imaging.
(3 credits) Offered from Spring 2025
Topological Data Analysis (CMSE 890-001, Fall 2025)
Applied and computational topology is an extremely active area of research, bringing together ideas from algebraic topology, computational geometry, and data analysis. The addition of topological methods to the traditionally statistics-based data analysis field has opened new doors for understanding data. In this course, we will study the recent advances in the field with a particular view towards persistent homology and its applications. The course will be project based so that students will walk away the ability to use the tools discussed on actual data sets.
List of major topics:
Basics of topology, including topological spaces, metric spaces, manifolds, homeomorphisms, and isomorphisms.
Simplicial complexes which can be used to represent the structure of a data set such as Cech, Rips, Delaunay, alpha, and witness complexes.
Homology, cohomology, relative homology, Betti numbers, and methods for their computation.
Morse theory and filtrations of simplicial complexes.
Persistent homology, the persistence algorithm, bottleneck distance, Wasserstein distance and the stability theorems. Persistence modules and interleavings. Reconstruction theorems. Statistics.
Variations of persistence, including extended persistence, vineyards, and zig-zag persistence.
Reeb graphs, Mapper, metrics, and sheaf theory.
Available tools for TDA in python
(3 credits) Lead Instructor: Prof. Liz Munch
Computational Image Formation and Processing (CMSE 890-002, Fall 2025)
This course will introduce students to mathematical, statistical, and machine learning methods for image generation in computational imaging as well as image processing/analysis methods. The imaging systems covered will include tomographic systems like CT or PET (used for medical, industrial, and scientific applications), magnetic resonance imaging, etc. The course will review foundational topics in multidimensional signals and systems, optimization, and image quality. We will cover the background physics and image formation process for several systems. The theory and application of image reconstruction methods for these modalities will be discussed. Coverage will span classical methods, more recent iterative techniques, and machine learning methods, appropriate for limited and noisy data. Methods for image enhancement and classification and segmentation will also be outlined. Students will receive hands-on experience implementing methods with programming assignments, mathematical problem solving, and review of recent papers in the field. We will have visits to imaging facilities as well as talks by experts from the industry on recent advancements and emerging challenges in diverse imaging and image analysis applications. Prior programming experience is required and some previous experience in mathematical analysis, linear algebra, signal processing, and optimization will be required or strongly encouraged.
(3 credits) Lead Instructor: Prof. Saiprasad Ravishankar, Prof. Adam Alessio
Generating and using high fidelity data for ML/AI: practical and theoretical perspectives (TBD )
This course covers the practical and theoretical aspects of generating high fidelity data for machine learning and artificial intelligence purposes (for, e.g., multiscale and multilevel models of physical systems) by in silico high fidelity computational modeling. The students will learn: how to efficiently sample parameter spaces; create workflows to instantiate, run, execute, and analyze large numbers of simulations and large volumes of data; reduce that data to manageable outputs using a variety of proven techniques; train ML/AI models with these reduced data outputs; and how to verify and validate the results using established techniques such as the method of manufactured solutions. This will all be done within the context of workflows that promote reproducible research in scientific computing (i.e., FAIR research principles). In addition, students will read and discuss papers from multiple application fields as case studies that demonstrate these principles in use. All of the computation will be done using MSU’s Institute for Cyber-Enabled Research and the MSU Data Machine, an NSF-funded, data science-oriented supercomputer.
List of major topics:
Efficient sampling high-dimensional model parameter spaces
Creating modern research computing workflows for simulation ensembles (e.g., tools for workflow automation such as SnakeMake)
Dimensionality reduction techniques and data reduction techniques as applied to multiphysics simulation data
Training of ML/AI models using high-performance computing tools (e.g., PyTorch, TensorFlow)
Verification and validation of ML/AI models (robustness, boundedness, out-of-distribution data detection, method of manufactured solutions)
Computation on modern HPC platforms
(3 credits) Lead Instructor: Prof. Brian O'Shea; Offering time TBD
DOE National Laboratory/Facility Office of Science Graduate Student Research (SCGSR) program POCs
The Office of Science Graduate Student Research (SCGSR) Program Application Page (Due date Nov 6th, 5pm EST)
Los Alamos National Laboratory (LANL) Graduate Research Assistant Program
LANL Information Science and Technology Institute (ISTI) Summer Schools
Please note that Trainees with fellowship support need to acknowledge the AIDMM-NRT program in their posters/presentations/publications and include an NSF disclaimer from the samples written below. Please use the text format below:
For MSU funded (international students):
This work is supported in part by Michigan State University and the National Science Foundation Research Traineeship Program (DGE-2152014) to (your name).
For NRT funded (domestic students):
This work is supported in part by the National Science Foundation Research Traineeship Program (DGE-2152014) to (your name).
Disclaimer:
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding organizations.
W-9 Form (US citizen) Request for Taxpayer Identification Number and Certification
W-8BEN Form (International student) Certificate of Foreign Status of Beneficial Owner for United States Tax Withholding and Reporting (Individuals)