CS673 Discrepancy Theory and Its Applications
Time: Monday and Thursday 9am-10:20am (ARC-105)
Instructor: Peng Zhang (pz149@rutgers.edu), office hours: Monday 11am - 12pm (Hill 444) or by appointment
Course description
Discrepancy theory asks the following question: Given a ground set and a family of its subsets, how do we two-color the elements in the ground set so that the coloring is nearly balanced in every subset? One naive idea is random coloring. Our goal is to improve upon random coloring. This problem and its variants have applications in sparsification, rounding in approximation algorithms, the design of randomized controlled trials, and so on.
This course will be an introduction on discrepancy theory, and will cover many of its applications. We will discuss basic concepts and classical problems, efficient algorithms developed in the last decade, and various applications. During the course, we will explore elegant tools from linear algebra, combinatorics, and probability theory.
A similar course taught by the instructor in Fall 2020 at Yale: https://sites.google.com/site/pengzhang27182/fall20-cpsc662
Prerequisites
The prerequisites for this course are basic knowledge of linear algebra, probability and stochastic processes (mainly random walks, martingales, and concentration inequalities), and algorithm design.
Materials
We will not have a textbook. The basics and classical bounds are from the following books. They are available online (the first book may require VPN).
Geometric Discrepancy by Matousek
The Discrepancy Method by Chazelle
Chapter 12 of The Probabilistic Method by Alon and Spencer
The materials on efficient algorithms and applications are from recent papers.
For the matrix part, we recommend:
Spectral Graph Theory by Daniel Spielman, and his Spectral and Algebraic Graph Theory
Spectral Graph Theory by Lap Chi Lau
Useful resources:
Nikolov's lecture notes in http://www.cs.toronto.edu/~anikolov/Prague18/ and http://www.cs.toronto.edu/~anikolov/csc2414.html
A STOC'20 workshop on Recent Advances in Discrepancy and Applications (organized by Bansal and Nikolov): https://www.win.tue.nl/~nikhil/disc-2020-stoc.html
Assignments
There will be 3 or 4 problem sets
Lecture notes
9/8: [notes] Continuous discrepancy, combinatorial discrepancy, and connections
9/9: [notes] Upper bounds for discrepancy for axis-parallel rectangles (Van der Corput set, Halton-Hammersely set for higher dimensions); a Julia notebook for Van der Corput set.
9/13: [notes] Lower bounds for discrepancy for axis-parallel rectangles (Roth's lower bound, Schmidt's tight lower bound)
9/16: [notes] Basic upper bounds for set systems: random coloring, derandomization; the Beck-Fiala theorem for sparse set systems
9/20: [notes] Matrix form, lower bounds for set systems: the Hadamard matrix bound, the eigenvalue bound, Roth's lower bound for arithmetic progressions
9/23: [notes] Linear discrepancy, hereditary discrepancy, the determinant lower bound
9/27: [notes] Spencer's 6*sqrt(n) upper bound for set systems
9/30: [notes] Upper bound for set systems from convex geometry
10/4 and 10/7: [notes] Polynomial-time algorithm for sqrt(n) discrepancy
10/11: [notes] Hardness for discrepancy minimization
10/14: [notes] Banaszczyk's theorem and subgaussianity
10/18 and 10/21: [notes] The Gram-Schmidt walk and the 1-subgaussianity of the signed sum of vectors
10/25: [notes] Online discrepancy
10/28: [notes] Online discrepancy (contd.)
11/1: [notes] Matrix Chernoff
11/4: [notes] Weighted sparsifiers for rank-1 matrices by sampling
11/8: [notes] Estimate sampling probabilities for graphs
11/11: [notes] Better weighted sparsifiers for rank-1 matrices by potential functions
11/15, 18 and 22, and 12/2, 6: [notes] Unweighted sparsifiers for rank-1 matrices and interlacing polynomials
12/9: [notes] discrepancy and the design of randomized experiments
Topics (tentative)
Basics, lower bounds and nonconstructive upper bounds:
Concepts of combinatorial discrepancy, continuous discrepancy, their connections
Random coloring (Chernoff bound), derandomization (via conditional expectation; via potential functions)
Improving random coloring: partial coloring, Spencer's entropy method
Better bounds for special set systems:
sparse set systems (the Beck-Fiala theorem)
set systems with very small discrepancy
set systems with bounded VC-dimensions
Lower bounds on discrepancy: matrix form, the Hadamard matrix bound, the eigenvalue bound, and the determinant bound
Robust concepts of discrepancy: linear and hereditary discrepancy, and connections
Efficient algorithms:
An algorithm via random walk: The Lovett-Meka edge walk
An algorithm via high dimensional geometry: solving a random LP (by Eldan and Singh), Rothvoss’ algorithm (sketch)
NP-hardness of approximating the discrepancy: a reduction to 2-2 set splitting
A more general setting (balancing unit vectors):
Non-partial coloring + SDP rounding
Banaszczyk’s theorem, equivalence to subgaussianity
the Gram-Schmidt walk algorithm
Random inputs
Online algorithms
Spectral sparsification (2nd order discrepancy):
Weighted sparsification via sampling, matrix Chernoff
Better weighted sparsification via barrier functions
Unweighted sparsification via interlacing family of polynomials (the Kadison-Singer problem)
Applications:
Balancing covariates in randomized controlled trials: the Neyman-Rubin causal model, the Gram-Schmidt walk design
Approximation algorithms:
LP rounding via discrepancy, applications to scheduling and bin packing
SDP rounding via the sticky Brownian motion, applications to max cut and other Constraint Satisfaction Problems
Differential privacy: relation to hereditary discrepancy