CS673 Discrepancy Theory and Its Applications

Time: Monday and Thursday 9am-10:20am (ARC-105)


Instructor: Peng Zhang (pz149@rutgers.edu), office hours: Monday 11am - 12pm (Hill 444) or by appointment


Course description


Discrepancy theory asks the following question: Given a ground set and a family of its subsets, how do we two-color the elements in the ground set so that the coloring is nearly balanced in every subset? One naive idea is random coloring. Our goal is to improve upon random coloring. This problem and its variants have applications in sparsification, rounding in approximation algorithms, the design of randomized controlled trials, and so on.


This course will be an introduction on discrepancy theory, and will cover many of its applications. We will discuss basic concepts and classical problems, efficient algorithms developed in the last decade, and various applications. During the course, we will explore elegant tools from linear algebra, combinatorics, and probability theory.


A similar course taught by the instructor in Fall 2020 at Yale: https://sites.google.com/site/pengzhang27182/fall20-cpsc662


Prerequisites


The prerequisites for this course are basic knowledge of linear algebra, probability and stochastic processes (mainly random walks, martingales, and concentration inequalities), and algorithm design.


Materials


We will not have a textbook. The basics and classical bounds are from the following books. They are available online (the first book may require VPN).

The materials on efficient algorithms and applications are from recent papers.


For the matrix part, we recommend:


Useful resources:


Nikolov's lecture notes in http://www.cs.toronto.edu/~anikolov/Prague18/ and http://www.cs.toronto.edu/~anikolov/csc2414.html

A STOC'20 workshop on Recent Advances in Discrepancy and Applications (organized by Bansal and Nikolov): https://www.win.tue.nl/~nikhil/disc-2020-stoc.html


Assignments


There will be 3 or 4 problem sets


Lecture notes

9/8: [notes] Continuous discrepancy, combinatorial discrepancy, and connections

9/9: [notes] Upper bounds for discrepancy for axis-parallel rectangles (Van der Corput set, Halton-Hammersely set for higher dimensions); a Julia notebook for Van der Corput set.

9/13: [notes] Lower bounds for discrepancy for axis-parallel rectangles (Roth's lower bound, Schmidt's tight lower bound)

9/16: [notes] Basic upper bounds for set systems: random coloring, derandomization; the Beck-Fiala theorem for sparse set systems

9/20: [notes] Matrix form, lower bounds for set systems: the Hadamard matrix bound, the eigenvalue bound, Roth's lower bound for arithmetic progressions

9/23: [notes] Linear discrepancy, hereditary discrepancy, the determinant lower bound

9/27: [notes] Spencer's 6*sqrt(n) upper bound for set systems

9/30: [notes] Upper bound for set systems from convex geometry

10/4 and 10/7: [notes] Polynomial-time algorithm for sqrt(n) discrepancy

10/11: [notes] Hardness for discrepancy minimization

10/14: [notes] Banaszczyk's theorem and subgaussianity

10/18 and 10/21: [notes] The Gram-Schmidt walk and the 1-subgaussianity of the signed sum of vectors

10/25: [notes] Online discrepancy

10/28: [notes] Online discrepancy (contd.)

11/1: [notes] Matrix Chernoff

11/4: [notes] Weighted sparsifiers for rank-1 matrices by sampling

11/8: [notes] Estimate sampling probabilities for graphs

11/11: [notes] Better weighted sparsifiers for rank-1 matrices by potential functions

11/15, 18 and 22, and 12/2, 6: [notes] Unweighted sparsifiers for rank-1 matrices and interlacing polynomials

12/9: [notes] discrepancy and the design of randomized experiments


Topics (tentative)

  • Basics, lower bounds and nonconstructive upper bounds:

    • Concepts of combinatorial discrepancy, continuous discrepancy, their connections

    • Random coloring (Chernoff bound), derandomization (via conditional expectation; via potential functions)

    • Improving random coloring: partial coloring, Spencer's entropy method

    • Better bounds for special set systems:

      • sparse set systems (the Beck-Fiala theorem)

      • set systems with very small discrepancy

      • set systems with bounded VC-dimensions

    • Lower bounds on discrepancy: matrix form, the Hadamard matrix bound, the eigenvalue bound, and the determinant bound

    • Robust concepts of discrepancy: linear and hereditary discrepancy, and connections

  • Efficient algorithms:

    • An algorithm via random walk: The Lovett-Meka edge walk

    • An algorithm via high dimensional geometry: solving a random LP (by Eldan and Singh), Rothvoss’ algorithm (sketch)

    • NP-hardness of approximating the discrepancy: a reduction to 2-2 set splitting

    • A more general setting (balancing unit vectors):

      • Non-partial coloring + SDP rounding

      • Banaszczyk’s theorem, equivalence to subgaussianity

      • the Gram-Schmidt walk algorithm

    • Random inputs

    • Online algorithms

  • Spectral sparsification (2nd order discrepancy):

    • Weighted sparsification via sampling, matrix Chernoff

    • Better weighted sparsification via barrier functions

    • Unweighted sparsification via interlacing family of polynomials (the Kadison-Singer problem)

  • Applications:

    • Balancing covariates in randomized controlled trials: the Neyman-Rubin causal model, the Gram-Schmidt walk design

    • Approximation algorithms:

      • LP rounding via discrepancy, applications to scheduling and bin packing

      • SDP rounding via the sticky Brownian motion, applications to max cut and other Constraint Satisfaction Problems

    • Differential privacy: relation to hereditary discrepancy