The Duality of Private Synthetic Data and Reconstruction Attacks


Speaker: Steven Wu

Abstract: 

In this talk, we will demonstrate how modern techniques for non-convex optimization can be a "cure" and "poison" for data privacy. We will first survey some of the recent advances in differentially private algorithms for generating synthetic data. By utilizing sophisticated non-convex optimization techniques, these algorithms can efficiently produce high-quality synthetic data from noisy measurements of aggregate statistics. However, when given numerically precise statistics, the same optimization techniques can be employed to reconstruct entire rows of individual-level data. In particular, we show how one could leverage (non-private) synthetic data methods to build efficient algorithms for reconstruction attacks. Finally, we will present how we apply such a reconstruction attack algorithm can be applied to 2010 US decennial Census data and Census-derived American Community Survey datasets, and outperforms a hierarchy of increasingly challenging baselines. Taken together, our methods and experiments illustrate the risks of releasing numerically precise aggregate statistics of a large dataset and provide further motivation for the careful application of provably private techniques such as differential privacy.