Marginal-based methods for differentially private synthetic data


Speaker: Ryan McKenna

Abstract:

In this talk, I will present a general approach for differentially private synthetic data generation known as the select-measure-generate paradigm.  This approach consists of three steps: (1) select a collection of low-dimensional marginal queries, (2) measure those queries privately using a noise-addition mechanism, and (3) generate a synthetic dataset that preserves the noisy measurements well.  In particular, I will present Private-PGM, a general-purpose solution to the generate subproblem that scales effectively to high-dimensional settings by utilizing probabilistic graphical models.  Additionally, I will discuss considerations and principles for the select subproblem, and specific published approaches, including MST, which was used by the winning team in the 2018 NIST DP synthetic data competition, and AIM, which appeared in VLDB 2022 and significantly improves MST.