Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Bui, Anh; Vu, Trang; Vuong, Long; Le, Trung; Montague, Paul; Abraham, Tamas; Kim, Junae; Phung, Dinh

Computer Science > Machine Learning

arXiv:2501.18950 (cs)

[Submitted on 31 Jan 2025 (v1), last revised 23 May 2025 (this version, v3)]

Title:Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Authors:Anh Bui, Trang Vu, Long Vuong, Trung Le, Paul Montague, Tamas Abraham, Junae Kim, Dinh Phung

View PDF HTML (experimental)

Abstract:Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {this https URL}.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2501.18950 [cs.LG]
	(or arXiv:2501.18950v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.18950
Journal reference:	International Conference on Learning Representations 2025

Submission history

From: Tuan Anh Bui [view email]
[v1] Fri, 31 Jan 2025 08:17:23 UTC (26,811 KB)
[v2] Thu, 27 Feb 2025 23:36:38 UTC (26,789 KB)
[v3] Fri, 23 May 2025 13:23:21 UTC (11,983 KB)

Computer Science > Machine Learning

Title:Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators