[ICML 2025] "Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning,"
Rongzhe Wei, Mufei Li, Mohsen Ghassemi, Eleonora Kreačić, Yifan Li, Xiang Yue, Bo Li, Vamsi K. Potluru, Pan Li, Eli Chien^.
[arXiv, ICML 2025]
Existing LLM privacy unlearning evaluation underestimates the privacy risk of rare/minority data: Large Language Models (LLMs) are trained on extensive datasets that often contain sensitive, human-generated information, raising significant concerns about privacy breaches. While certified unlearning approaches offer strong privacy guarantees, they rely on restrictive model assumptions that are not applicable to LLMs. As a result, various unlearning heuristics have been proposed, with the associated privacy risks assessed only empirically. The standard evaluation pipelines typically randomly select data for removal from the training set, apply unlearning techniques, and use membership inference attacks (MIAs) to compare the unlearned models against models retrained without the to-be-unlearned data. However, since every data point is subject to the right to be forgotten, unlearning should be considered in the worst-case scenario from the privacy perspective. Prior work shows that data outliers may exhibit higher memorization effects. Intuitively, they are harder to be unlearn and thus the privacy risk of unlearning them is overlooked and underestimated in the current evaluation. In this paper, we leverage minority data to identify such a critical flaw in previously widely adopted evaluations. We substantiate this claim through carefully designed experiments, including unlearning canaries related to minority groups, inspired by privacy auditing literature. Using personally identifiable information (PII) as a representative minority identifier, we demonstrate that minority groups experience at least 20% more privacy leakage in most cases across six unlearning approaches, three MIAs, three benchmark datasets, and two LLMs of different scales. Given that the right to be forgotten should be upheld for every individual, we advocate for a more rigorous evaluation of LLM unlearning methods. Our minority-aware evaluation framework represents an initial step toward ensuring more equitable and thorough assessments of LLM unlearning efficacy.
[ICLR 2025] “Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness”
Eli Chien, Pan Li.
[arXiv]
Hidden-state DP analysis for Noisy(S)GD: Hidden-state DP analysis aims to provide a tight DP bound for the case only the last iterate is revealed. Standard DP analysis for Noisy-SGD, such as the (standard) composition theorem, actually provides a bound for the case of all intermediate states being made public. This leads to a divergent privacy bound with respect to the number of training iterations, which may be sub-optimal. Prior works on hidden-state DP analysis often require strict assumptions on the loss such as (strong) convexity and smoothness, and state whether these assumptions can be relaxed as an open problem. In our work, we provide a positive answer. We provide not only a convergent DP bound for non-smooth non-convex problems, but also a tighter bound for smooth strongly convex problems. Our analysis relies on the improvement of shifted divergence analysis in multiple aspects, including forward Wasserstein distance tracking, identifying the optimal shifts allocation, and the Holder reduction lemma. Our results further elucidate the benefit of hidden state analysis for DP and its applicability.
[NeurIPS 2024 (spotlight)] “Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning”
Eli Chien, Haoyu Wang, Ziang Chen, Pan Li.
[arXiv, ICLR 2024 PrivML workshop (spotlight), TPDP 2024]
[NeurIPS 2024] “Certified Machine Unlearning via Noisy Stochastic Gradient Descent”
Eli Chien, Haoyu Wang, Ziang Chen, Pan Li
[arXiv]
Unlearning via Noisy(S)GD: We propose a new perspective for the machine unlearning problem that unifies the DP learning process and the privacy-certified unlearning process with many algorithmic benefits. Each dataset (D) corresponds to a unique stationary model weight distribution (\nu_D). Learning with Noisy-(S)GD provides an initial privacy loss/DP guarantee (\epsilon_0). Unlearning with the same Noisy-(S)GD will reduce it monotonically, where we may stop until the privacy loss is no larger than \epsilon. First, this argument does not rely on strong convexity, where we leverage Langevin dynamic analysis to prove the desired privacy bound. Second, our approach provides a unique privacy-utility-efficiency trade-off: smaller noise gives larger \epsilon_0, which can later be reduced to \epsilon by unlearning iterations (at the cost of unlearning efficiency). Our second work allows a mini-batch setting and provides the state-of-the-art privacy-utility-efficiency tradeoff for unlearning under the strongly convex assumption. The analysis relies on the shifted divergence analysis instead of Langevin dynamic analysis.
[ICLR 2023] “Efficient Model Updates for Approximate Unlearning of Graph-Structured Data” (a.k.a Certified Graph Unlearning)
Eli Chien*, Chao Pan*, Olgica Milenkovic.
[ICLR 2023, NeurIPS 2022 GLFrontiers Workshop, arXiv (workshop version), code]
[TheWebConf 2023] “Unlearning graph classifiers with limited data resources”
Chao Pan*, Eli Chien*, Olgica Milenkovic.
[TheWebConf 2023, arXiv, code]
Certified Graph Unlearning: We propose a series of works for graph unlearning with differential privacy types of guarantees. That is, an adversary cannot distinguish model parameters between training on a dataset before and after unlearning request, as their model parameter distributions are approximately the same. Our work is the first to tackle the approximate graph unlearning problem in various settings. We study three different scenarios including node feature, edge and node unlearning, which means one or a few of them are removed according to the unlearning request. Our studied downstream tasks include node (ICLR'23) and graph (TheWebConf'23) classification problems. We show that extending existing machine unlearning to graph is non-trivial in theory and essential in practice. Our methods demonstrate superior privacy-accuracy-complexity tradeoffs compared to retraining from scratch and prior unstructured graph unlearning approaches.
My name with ^ indicates the students (in bold font) are mainly advised by me when doing the corresponding project.
“Exploring the Opportunities and Challenges of Graph Neural Networks in Electrical Engineering”
Eli Chien, Mufei Li, Anthony Aportela, Kerr Ding, Shuyi Jia, Supriyo Maji, Zhongyuan Zhao, Javier Duarte, Victor Fung, Callie Hao, Yunan Luo, Olgica Milenkovic, David Pan, Santiago Segarra, Pan Li
[Nature Reviews Electrical Engineering 2024]
"Underestimated Privacy Risks for Minority Populations in Large Language Model Unlearning,"
Rongzhe Wei, Mufei Li, Mohsen Ghassemi, Eleonora Kreačić, Yifan Li, Xiang Yue, Bo Li, Vamsi K. Potluru, Pan Li, Eli Chien^.
[arXiv, ICML 2025]
“Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness,”
Eli Chien, Pan Li.
[arXiv, ICLR 2025]
“Langevin Unlearning: A New Perspective of Noisy Gradient Descent for Machine Unlearning”
Eli Chien, Haoyu Wang, Ziang Chen, Pan Li.
[arXiv, NeurIPS 2024 (spotlight), ICLR 2024 PrivML workshop (spotlight), TPDP 2024]
“Certified Machine Unlearning via Noisy Stochastic Gradient Descent”
Eli Chien, Haoyu Wang, Ziang Chen, Pan Li
[arXiv, NeurIPS 2024]
“Differentially Private Graph Diffusion with Applications in Personalized PageRanks”
Rongzhe Wei, Eli Chien, Pan Li
[arXiv, NeurIPS 2024]
“On the Inherent Privacy Properties of Discrete Denoising Diffusion Models”
Rongzhe Wei, Eleonora Kreacic, Haoyu Peter Wang, Haoteng Yin, Eli Chien, Vamsi K. Potluru, Pan Li
[TMLR]
“Machine Unlearning of Pre-trained Large Language Models”
Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue
“Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning”
Zheyuan Liu*, Guangyao Dou*, Yijun Tian, Chunhui Zhang, Eli Chien^, Ziwei Zhu^
[TheWebConf 2024, arXiv]
“Differentially Private Decoupled Graph Convolutions for Multigranular Topology Protection”
Eli Chien*, Wei-Ning Chen*, Chao Pan*, Pan Li, Ayfer Özgür, Olgica Milenkovic.
“Efficient Model Updates for Approximate Unlearning of Graph-Structured Data” (a.k.a Certified Graph Unlearning)
Eli Chien*, Chao Pan*, Olgica Milenkovic.
[ICLR 2023, NeurIPS 2022 GLFrontiers Workshop, arXiv (workshop version), code]
“Unlearning Nonlinear Graph Classifiers in the Limited Training Data Regime”
Chao Pan*, Eli Chien*, Olgica Milenkovic.
[TheWebConf 2023, arXiv, code]
"LayerDAG: A Layerwise Autoregressive Diffusion Model for Directed Acyclic Graph Generation,"
Mufei Li, Viraj Shitole, Eli Chien, Changhai Man, Zhaodong Wang, Srinivas, Ying Zhang, Tushar Krishna, Pan Li
[ICLR 2025 (Spotlight), arXiv, code]
"PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation,"
Eli Chien, Jiong Zhang, Cho-Jui Hsieh, Jyun-Yu Jiang, Wei-Cheng Chang, Olgica Milenkovic, Hsiang-Fu Yu
“Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction,”
Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, Inderjit S Dhillon.
[ICLR2022, arxiv, code, OGB leaderboard]
“You are AllSet: A Multiset Function Framework for Hypergraph Neural Networks,”
Eli Chien*, Jianhao Peng*, Chao Pan*, Olgica Milenkovic.
“Adaptive Universal Generalized PageRank Graph Neural Network,”
Eli Chien*, Jianhao Peng*, Pan Li, Olgica Milenkovic.
[ICLR2021, arXiv, code, slides]
"Multi-MotifGAN (MMGAN): Motif-targeted Graph Generation and Prediction,''
Anuththari Gamage, Eli Chien, Jianhao Peng, Olgica Milenkovic.
[ICASSP 2020, arXiv]
"Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls,"
Saurav Prakash, Jin Sima, Chao Pan, Eli Chien, Olgica Milenkovic
[TMLR]
"Provably Accurate and Scalable Linear Classifiers in Hyperbolic Spaces,"
Chao Pan*, Eli Chien*, Puoya Tabaghi,Jianhao Peng, Olgica Milenkovic.
[KAIS 2022, arXiv, ICDM 2021 conference version]
"HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering,"
Eli Chien, Puoya Tabaghi, Olgica Milenkovic.
"Highly Scalable and Provably Accurate Classification in Poincare Balls,"
Eli Chien*, Chao Pan*, Puoya Tabaghi, Olgica Milenkovic.
[ICDM 2021 (Regular), arXiv(Long version), code]
"Landing Probabilities of Random Walks for Seed-Set Expansion in Hypergraphs,''
Eli Chien*, Pan Li*, Olgica Milenkovic.
[ITW 2021, arXiv(Long version)]
"Active learning in the geometric block model,''
Eli Chien, Antonia Maria Tulino, Jaime Llorca.
[AAAI 2020, arXiv]
"Optimizing Generalized PageRank Methods for Seed-Expansion Community Detection,''
Pan Li*, Eli Chien*, Olgica Milenkovic.
"HS^2: Active Learning over Hypergraphs with pointwise and pairwise queries,''
I (Eli) Chien, Huozhi Zhou, Pan Li.
"On the Minimax Misclassification Ratio of Hypergraph Community Detection,''
I Chien*, Chung-Yi Lin*, I-Hsiang Wang.
[Transactions on Information Theory 2019, arxiv]
"Community detection in hypergraphs: Optimal statistical limit and efficient algorithms,''
I Chien, Chung-Yi Lin, I-Hsiang Wang.
"On the fundamental statistical limit of community detection in random hypergraphs,''
Chung-Yi Lin, I (Eli) Chien, I-Hsiang Wang.
[ISIT 2017, online access]
“Support Estimation with Sampling Artifacts and Errors,”
Eli Chien, Olgica Milenkovic, Angelia Nedich.
[ISIT 2021, arXiv (full)]
"Query K-means Clustering and the Double Dixie Cup Problem,''
I (Eli) Chien, Chao Pan, Olgica Milenkovic.
"Small-sample estimation of the mutational support and distribution of SARS-CoV-2,"
Vishal Rana, Eli Chien, Jianhao Peng and Olgica Milenkovic
[TCBB 2022, Earlier version: medRxiv]
"Representer Point Selection for Explaining Regularized High-dimensional Models,"
Che-Ping Tsai, Jiong Zhang, Hsiang-Fu Yu, Eli Chien, Cho-Jui Hsieh, Pradeep Kumar Ravikumar
[ICML 2023, arXiv]