Deidentification and Data Deletion --- A Cryptographer's Take

Speaker: Aloni Cohen

Abstract:

Seriously engaging with law and policy exposes new mathematical and technical questions. I will describe two recent works motivated by the GDPR and data protection regulation generally. The first on deidentification / anonymization, and the second on data deletion.

Quasi-identifier-based deidentification techniques (QI-deidentification) are widely used in practice, including k-anonymity, l-diversity, and t-closeness.

We introduce a new class of privacy attacks called downcoding attacks, and prove that every QI-deidentification scheme is vulnerable to downcoding attacks if it is minimal and hierarchical. We convert the downcoding attacks into powerful predicate singling-out (PSO) attacks, which were recently proposed as a way to demonstrate that a privacy mechanism fails to legally anonymize under Europe's General Data Protection Regulation. These attacks demonstrate that QI-deidentification may offer no protection even if every attribute is treated as a quasi-identifier.

Recent digital rights frameworks give users the right to delete their data from systems that store and process their personal information (e.g., the "right to be forgotten'' in the GDPR). How should deletion be formalized for synthetic data, machine learning, and other complex systems? We propose a new formalism: deletion-as-control. It allows users' data to be freely used before deletion, while also imposing a meaningful requirement after deletion---thereby giving users more control. We relate deletion-as-control to differential privacy and to existing work on machine unlearning.

Page updated

Report abuse