FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Hou, Linshan; Luo, Wei; Hua, Zhongyun; Chen, Songhua; Zhang, Leo Yu; Li, Yiming

Computer Science > Cryptography and Security

arXiv:2411.19479 (cs)

[Submitted on 29 Nov 2024 (v1), last revised 18 Jun 2025 (this version, v2)]

Title:FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Authors:Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li

View PDF HTML (experimental)

Abstract:Deep neural networks (DNNs) are susceptible to backdoor attacks, where adversaries poison datasets with adversary-specified triggers to implant hidden backdoors, enabling malicious manipulation of model predictions. Dataset purification serves as a proactive defense by removing malicious training samples to prevent backdoor injection at its source. We first reveal that the current advanced purification methods rely on a latent assumption that the backdoor connections between triggers and target labels in backdoor attacks are simpler to learn than the benign features. We demonstrate that this assumption, however, does not always hold, especially in all-to-all (A2A) and untargeted (UT) attacks. As a result, purification methods that analyze the separation between the poisoned and benign samples in the input-output space or the final hidden layer space are less effective. We observe that this separability is not confined to a single layer but varies across different hidden layers. Motivated by this understanding, we propose FLARE, a universal purification method to counter various backdoor attacks. FLARE aggregates abnormal activations from all hidden layers to construct representations for clustering. To enhance separation, FLARE develops an adaptive subspace selection algorithm to isolate the optimal space for dividing an entire dataset into two clusters. FLARE assesses the stability of each cluster and identifies the cluster with higher stability as poisoned. Extensive evaluations on benchmark datasets demonstrate the effectiveness of FLARE against 22 representative backdoor attacks, including all-to-one (A2O), all-to-all (A2A), and untargeted (UT) attacks, and its robustness to adaptive attacks. Codes are available at \href{this https URL}{BackdoorBox} and \href{this https URL}{backdoor-toolbox}.

Comments:	15 pages, This paper is accepted and will appear in TIFS (CCF-A)
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2411.19479 [cs.CR]
	(or arXiv:2411.19479v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2411.19479

Submission history

From: Linshan Hou [view email]
[v1] Fri, 29 Nov 2024 05:34:21 UTC (2,783 KB)
[v2] Wed, 18 Jun 2025 08:32:27 UTC (3,172 KB)

Computer Science > Cryptography and Security

Title:FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators