Leveraging Synthetic Data for Machine Unlearning
Speaker: Alessandro Achille
Speaker: Alessandro Achille
Abstract:Â
Synthetic data has been predominantly explored as a means to enhance the performance of machine learning (ML) systems. In this talk, we explore the opposite direction of using synthetic data for facilitating "unlearning" of ML models: Current models require large amount of data and compute capacity to train, to the extent that any defects in the training corpus cannot be trivially remedied by retraining the model from scratch. Machine unlearning aims to fix defects in the training corpus data by eliminating not just the improperly used data, but also the effects of improperly used data on any component of an ML model. However, due to the non-convex nature of deep networks, accurately estimating and removing the influence of a particular sample on the trained weights poses significant challenges. To address this, I will present recently introduced methods to enable both unlearning and differential privacy in large scale system by relying on a core subset of safe data. I will then explore the role that synthetic data can play in these systems, as well as new directions forward.