Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Wang, Lingzhi; Zeng, Xingshan; Guo, Jinsong; Wong, Kam-Fai; Gottlob, Georg

Computer Science > Computation and Language

arXiv:2402.05813 (cs)

[Submitted on 8 Feb 2024 (v1), last revised 16 Dec 2024 (this version, v2)]

Title:Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Authors:Lingzhi Wang, Xingshan Zeng, Jinsong Guo, Kam-Fai Wong, Georg Gottlob

View PDF HTML (experimental)

Abstract:This paper explores Machine Unlearning (MU), an emerging field that is gaining increased attention due to concerns about neural models unintentionally remembering personal or sensitive information. We present SeUL, a novel method that enables selective and fine-grained unlearning for language models. Unlike previous work that employs a fully reversed training objective in unlearning, SeUL minimizes the negative impact on the capability of language models, particularly in terms of generation. Furthermore, we introduce two innovative evaluation metrics, sensitive extraction likelihood (S-EL) and sensitive memorization accuracy (S-MA), specifically designed to assess the effectiveness of forgetting sensitive information. In support of the unlearning framework, we propose efficient automatic online and offline sensitive span annotation methods. The online selection method, based on language probability scores, ensures computational efficiency, while the offline annotation involves a two-stage LLM-based process for robust verification. In summary, this paper contributes a novel selective unlearning method (SeUL), introduces specialized evaluation metrics (S-EL and S-MA) for assessing sensitive information forgetting, and proposes automatic online and offline sensitive span annotation methods to support the overall unlearning framework and evaluation process.

Comments:	Accepted to AAAI2025
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.05813 [cs.CL]
	(or arXiv:2402.05813v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.05813

Submission history

From: Lingzhi Wang [view email]
[v1] Thu, 8 Feb 2024 16:50:01 UTC (1,782 KB)
[v2] Mon, 16 Dec 2024 12:44:07 UTC (3,325 KB)

Computer Science > Computation and Language

Title:Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators