CURE

CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models

Accepted as NeurIPS 2025 Spotlight

Shristi Das Biswas, Arani Roy, Kaushik Roy

Purdue University

Abstract

As Text-to-Image models continue to evolve, so does the risk of generating unsafe, copyrighted, or privacy-violating content. Existing safety interventions - ranging from training data curation and model fine-tuning to inference-time filtering and guidance - often suffer from incomplete concept removal, susceptibility to jail-breaking, computational inefficiency, or collateral damage to unrelated capabilities. In this paper, we introduce CURE, a training-free concept unlearning framework that operates directly in the weight space of pre-trained diffusion models, enabling fast, interpretable, and highly specific suppression of undesired concepts. At the core of our method is the Spectral Eraser, a closed-form, orthogonal projection module that identifies discriminative subspaces using Singular Value Decomposition over token embeddings associated with the concepts to forget and retain. Intuitively, the Spectral Eraser identifies and isolates features unique to the undesired concept while preserving safe attributes. This operator is then applied in a single step update to yield an edited model in which the target concept is effectively unlearned - without retraining, supervision, or iterative optimization. To balance the trade-off between filtering toxicity and preserving unrelated concepts, we further introduce an Expansion Mechanism for spectral regularization which selectively modulates singular vectors based on their relative significance to control the strength of forgetting. All the processes above are in closed-form, guaranteeing extremely efficient erasure in only 2 seconds. Benchmarking against prior approaches, CURE achieves a more efficient and thorough removal for targeted artistic styles, objects, identities, or explicit content, with minor damage to original generation ability and demonstrates enhanced robustness against red-teaming.

Paper arXiv Code

Overview of CURE. Given forget (F) and retain (R) sets, CURE constructs energy-scaled projectors (Pf , Pr) over their respective subspaces (Part 1), and derives a composite projection Pdis to suppress erasable components while preserving shared content (Part 2). The action of Pf , Pr , and Pf Pr on unit vectors yields directions aligned with the forget (red), retain (green), and shared (blue) subspaces. This operator is applied to cross-attention (Part 3), enabling lightweight unlearning.

Our method, CURE, enables robust and efficient erasure of any target concept in text-to-image models through orthogonal closed-form editing of cross-attention weights, ensuring that the unintended concepts remain intact, even if they share common terms with the target concept (seen in the bottom-left sample). This can safeguard celebrity portrait rights, respect copyrights on artworks, and prevent explicit or unwanted content creation in a training-free manner with high efficacy.

CURE removes object and identity concepts, unlearning both direct and synonymous forms (top), while preserving unrelated concepts that may share common words with the target (bottom).

Comparison of unlearning methods on removing target artist styles and NSFW content. CURE more effectively suppresses the intended concept (blue arrows). ⋆ masks any unsafe outputs.

a) CURE achieves stronger erasure with lower unwanted interference than baselines. Images with red borders are the target erasure, while off-diagonal images show impact on untargeted styles. (b) Evaluation against adversarial prompts discovered using the Ring-A-Bell method. Our method effectively eliminates Van Gogh’s style, unlike baselines that remain vulnerable to leakage.

Page updated

Google Sites

Report abuse