Privacy-preserving Data Mining in Industry (Tutorial)
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary & LinkedIn's PriPeARL framework for privacy-preserving analytics and reporting, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
- ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018 [slides])
- ACM International Conference on Web Search and Data Mining (WSDM 2019 [slides])
- The Web Conference (WWW 2019 [slides])
Tutorial Context: Fairness, Privacy, and Transparency by Design in AI/ML Systems
The presenters have extensive experience in the field of privacy-preserving data mining, including the theory and application of privacy techniques. They have published several well-cited papers on privacy. For instance, Krishnaram Kenthapadi and Ilya Mironov co-authored the second paper on differential privacy (EUROCRYPT 2006), which introduced the notion of (\epsilon, \delta)-differential privacy. Abhradeep Guha Thakurta’s research on differential privacy and machine learning has appeared in reputed venues such as STOC, FOCS, NIPS, KDD, and ICML. They also have rich experience applying privacy techniques in practice (LinkedIn Salary and LinkedIn's PriPeARL framework for privacy-preserving analytics and reporting, Google’s RAPPOR, and Apple’s differential privacy deployment for iOS & macOS respectively), and have been providing technical leadership on AI and privacy at their respective organizations.
Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the fairness, transparency, explainability, and privacy modeling efforts across different LinkedIn applications. He also serves as LinkedIn's representative in Microsoft's AI and Ethics in Engineering and Research (AETHER) Committee. He shaped the technical roadmap and led the privacy/modeling efforts for LinkedIn Salary product, and prior to that, served as the relevance lead for the LinkedIn Careers and Talent Solutions AI team, which powers search/recommendation products at the intersection of members, recruiters, and career opportunities. He also led the development of LinkedIn's PriPeARL system for privacy-preserving analytics. Previously, he was a Researcher at Microsoft Research Silicon Valley, where his work resulted in product impact (and Gold Star / Technology Transfer awards), and several publications/patents. He received his Ph.D. degree in Computer Science from Stanford University in 2006, for his thesis, "Models and Algorithms for Data Privacy". He serves regularly on the program committees of KDD, WWW, WSDM, and related conferences, and co-chaired the 2014 ACM Symposium on Computing for Development. He received the CIKM best case studies paper award, the SODA best student paper award, and the WWW best paper award nomination for his work on privacy-preserving publishing of web search logs. He has published 35+ papers, with 2500+ citations and filed 130+ patents.
Ilya Mironov is a Staff Research Scientist at Google Brain working in privacy of machine learning. Prior to joining Google, he was a Researcher at Microsoft Research Silicon Valley, where he co-authored seminal works on applications and implementations of differential privacy. He was a keynote speaker at several recent events, such as Private and Secure Machine Learning workshop at ICML 2017, Theory and Practice of Differential Privacy workshop co-located with ACM CCS 2017, and NordSec 2017. He graduated with a Ph.D. in Computer Science from Stanford in 2003.
Abhradeep Guha Thakurta is an Assistant Professor in the Department of Computer Science at University of California Santa Cruz. His primary research interest is in the intersection of data privacy and machine learning. He focuses on demonstrating, both theoretically and in practice, that privacy enables designing better machine learning algorithms, and vice versa. Prior to joining the academia, Abhradeep was a Senior Machine Learning Scientist at Apple, where he was the primary algorithm designer in deploying local differential privacy on iOS 10, and macOS Sierra. This project is possibly one of the largest industrial deployments of differential privacy and has resulted in at least 200+ press articles, a follow-up paper at NIPS 2017, and four granted patents. He won the Early Career Award in 2016 from Pennsylvania State University for the same project. Before Apple, Abhradeep worked as a post-doctoral researcher at Microsoft Research Silicon Valley and Stanford University, and then he worked as a Research Scientist at Yahoo Research. He did his Ph.D. from Pennsylvania State University.