Fast Linear Algebra for Data Science

Project 2024-25

Project Title: Fast Linear Algebra for Data Science

Professor: Cameron Musco

Lab/Research Group: Theory Group

Members of the Theory Group use mathematical techniques to study problems throughout computer science. Our work is extremely diverse -- it includes graph and network algorithms, randomized approximation algorithms, streaming algorithms, combinatorial optimization, computational geometry, dynamic algorithms and complexity, model checking and static analysis, database theory, descriptive complexity, parallel algorithms, and computational complexity theory. Members of the Theory Group wear other hats as well and collaborate throughout the department and the world beyond. Every week we all get together for a research seminar, featuring either one of our own professors or Ph.D. students, or a visitor from another institution. You can find more about the group and our weekly seminars at https://groups.cs.umass.edu/theory/.

Project Description

Numerical linear algebra studies algorithms for problems related to matrices -- such as solving linear systems of equations, computing eigenvectors, and computing matrix decompositions like the SVD. It is one of the most important subfields of algorithms, with applications across the sciences, engineering, and data sciences. Turing Award winner Michael Stonebraker has commented that "If you open up the codes that are underneath [most data science applications] this is all linear algebra on arrays."

In recent years, the field has been challenged by the need to solve problems involving massive matrices arising in data science applications. To address this challenge, the field has turned to *randomized algorithms*, which offer the potential for significant running time gains, at the cost of small approximation error.

This project will study random sampling methods for solving basic linear algebraic problems on very large matrices. We will focus on *sublinear time algorithms*, that do not even read the full input matrix, but estimate its properties from a small random sample of its entries, or of its rows/columns. We will empirically test different sampling approaches for estimating the matrix eigenvalues and related properties, and when possible try to develop a theoretical analysis of our algorithms, using tools from theoretical computer science and random matrix.

Learning Objectives:

1. Ability to perform matrix computations efficiently using a common language, such as Matlab and Python.

2. Basic understanding of different algorithmic paradigms in linear algebra (direct methods, iterative methods, randomized methods).

3. Ability to perform rigorous empirical evaluation of algorithm performance.

Skills needed:

Basic background in linear algebra
Ability of writing code in Python and willingness to learn
Ability to collaborate on writing code using a version control system like Git.
Ability to run reproducible experiments and produce plots recording the results.

Report abuse