One of the most salient features of Big Data is the dissemination of huge amounts of personal and sensitive data. Despite their enormous societal benefits, new powerful tools of machine learning such as deep learning pose real threats to personal privacy. Heuristic approaches to deal with the privacy problem such as data anonymization are at best unreliable: the last decade has seen a string of attacks that recover personal information from supposedly "anonymized" data.
The last decade has witnessed the rise of a rigorous theory to deal with this challenge. This theory is centered around a meaningful and robust mathematical definition for privacy, known as Differential Privacy. Differential privacy is a rigorous mathematical framework whose goal is to enable designing new algorithms that can provide provable privacy guarantees for their input sensitive data sets while producing highly accurate analyses and models based on such sensitive data. A powerful algorithmic framework for differential privacy has been developed over the years, and led to numerous practical and efficient algorithms with strong and provable privacy guarantees for various applications in machine learning, data mining, and statistics.
Differential privacy has recently made it to the industrial domain with several successful adoptions, notably by Google, Apple, and Uber. Differential privacy will also be adopted by the U.S. Census starting in 2020. On the other hand, there are several limitations with the existing tools in differential privacy that call for a new creative solutions.
This class will start by demonstrating the need for a rigorous privacy framework via several examples that led to high-profile privacy breaches. I will then introduce differential privacy, discuss the semantics of its guarantees, and discuss powerful algorithmic techniques in the literature of differential privacy for design efficient, scalable algorithms satisfying this solid privacy guarantee for several important applications in machine learning and statistical data analysis. This includes discussion of algorithms that have been deployed in industry (Google, Apple, and Uber). We will also highlight some limitations with the existing tools and discuss several new efforts to fully extend the power of differential privacy to the area of deep learning.
The goal of this course is to introduce students to the burgeoning area of privacy-preserving machine learning and data analysis, and mainly to differential privacy. This course aim to help students take up a research career in data privacy, or pursue industry positions in privacy engineering, of which there has been increasing demand especially in big corporations such as Google, Apple, Uber and many others. At the end of this course, students are expected to have a solid understanding of the foundational concepts of private data analysis, and have a good grasp of the design principles of practical and useful algorithms that provide strong and provable privacy guarantees for various modern data applications.
If you have not taken one of these classes, but you feel confident you have the knowledge from other sources/courses, you can still enroll.
(see possible topics and some relevant papers in the References section below).
– Mid-term check (discussion with groups during office hours)
– Final Report: a) Introduction: literature review + motivation, b) Problem statement, c) Proposed solution/implementation and Results.
– Project presentation: ~ 25-30 min. per group during the last 2 weeks of class.
Attacks on Privacy:
Early papers: definitions, basic mechanisms, properties
More tools and algorithmic techniques
Differentially private machine learning
Local (Distributed) Model of Differential Privacy
Differential Privacy for Streaming
Lower Bounds (Limits of Differential Privacy)
Relaxations of Differential Privacy (Non-worst case definitions allowing for distributional assumptions)
Differential Privacy for Robust Adaptive Data Analysis