-
On the Credibility of Backdoor Attacks Against Object Detectors in the Physical World
Authors:
Bao Gia Doan,
Dang Quang Nguyen,
Callum Lindquist,
Paul Montague,
Tamas Abraham,
Olivier De Vel,
Seyit Camtepe,
Salil S. Kanhere,
Ehsan Abbasnejad,
Damith C. Ranasinghe
Abstract:
Object detectors are vulnerable to backdoor attacks. In contrast to classifiers, detectors possess unique characteristics, architecturally and in task execution; often operating in challenging conditions, for instance, detecting traffic signs in autonomous cars. But, our knowledge dominates attacks against classifiers and tests in the "digital domain".
To address this critical gap, we conducted…
▽ More
Object detectors are vulnerable to backdoor attacks. In contrast to classifiers, detectors possess unique characteristics, architecturally and in task execution; often operating in challenging conditions, for instance, detecting traffic signs in autonomous cars. But, our knowledge dominates attacks against classifiers and tests in the "digital domain".
To address this critical gap, we conducted an extensive empirical study targeting multiple detector architectures and two challenging detection tasks in real-world settings: traffic signs and vehicles. Using the diverse, methodically collected videos captured from driving cars and flying drones, incorporating physical object trigger deployments in authentic scenes, we investigated the viability of physical object-triggered backdoor attacks in application settings.
Our findings revealed 8 key insights. Importantly, the prevalent "digital" data poisoning method for injecting backdoors into models does not lead to effective attacks against detectors in the real world, although proven effective in classification tasks. We construct a new, cost-efficient attack method, dubbed MORPHING, incorporating the unique nature of detection tasks; ours is remarkably successful in injecting physical object-triggered backdoors, even capable of poisoning triggers with clean label annotations or invisible triggers without diminishing the success of physical object triggered backdoors. We discovered that the defenses curated are ill-equipped to safeguard detectors against such attacks. To underscore the severity of the threat and foster further research, we, for the first time, release an extensive video test set of real-world backdoor attacks. Our study not only establishes the credibility and seriousness of this threat but also serves as a clarion call to the research community to advance backdoor defenses in the context of object detection.
△ Less
Submitted 30 October, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks
Authors:
Bao Gia Doan,
Afshar Shamsi,
Xiao-Yu Guo,
Arash Mohammadi,
Hamid Alinejad-Rokny,
Dino Sejdinovic,
Damien Teney,
Damith C. Ranasinghe,
Ehsan Abbasnejad
Abstract:
Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computa…
▽ More
Computational complexity of Bayesian learning is impeding its adoption in practical, large-scale tasks. Despite demonstrations of significant merits such as improved robustness and resilience to unseen or out-of-distribution inputs over their non- Bayesian counterparts, their practical use has faded to near insignificance. In this study, we introduce an innovative framework to mitigate the computational burden of Bayesian neural networks (BNNs). Our approach follows the principle of Bayesian techniques based on deep ensembles, but significantly reduces their cost via multiple low-rank perturbations of parameters arising from a pre-trained neural network. Both vanilla version of ensembles as well as more sophisticated schemes such as Bayesian learning with Stein Variational Gradient Descent (SVGD), previously deemed impractical for large models, can be seamlessly implemented within the proposed framework, called Bayesian Low-Rank LeArning (Bella). In a nutshell, i) Bella achieves a dramatic reduction in the number of trainable parameters required to approximate a Bayesian posterior; and ii) it not only maintains, but in some instances, surpasses the performance of conventional Bayesian learning methods and non-Bayesian baselines. Our results with large-scale tasks such as ImageNet, CAMELYON17, DomainNet, VQA with CLIP, LLaVA demonstrate the effectiveness and versatility of Bella in building highly scalable and practical Bayesian deep models for real-world applications.
△ Less
Submitted 18 February, 2025; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Bayesian Learned Models Can Detect Adversarial Malware For Free
Authors:
Bao Gia Doan,
Dang Quang Nguyen,
Paul Montague,
Tamas Abraham,
Olivier De Vel,
Seyit Camtepe,
Salil S. Kanhere,
Ehsan Abbasnejad,
Damith C. Ranasinghe
Abstract:
The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can…
▽ More
The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can be identified using epistemic uncertainty of ML approaches -- epistemic uncertainty in a machine learning-based malware detector is a result of a lack of similar training samples in regions of the problem space. In particular, a Bayesian formulation can capture the model parameters' distribution and quantify epistemic uncertainty without sacrificing model performance. To verify our hypothesis, we consider Bayesian learning approaches with a mutual information-based formulation to quantify uncertainty and detect adversarial malware in Android, Windows domains and PDF malware. We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware. In particular, Bayesian models: (1) are generally capable of identifying adversarial malware in both feature and problem space, (2) can detect concept drift by measuring uncertainty, and (3) with a diversity-promoting approach (or better posterior approximations) lead to parameter instances from the posterior to significantly enhance a detectors' ability.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Feature-Space Bayesian Adversarial Learning Improved Malware Detector Robustness
Authors:
Bao Gia Doan,
Shuiqiao Yang,
Paul Montague,
Olivier De Vel,
Tamas Abraham,
Seyit Camtepe,
Salil S. Kanhere,
Ehsan Abbasnejad,
Damith C. Ranasinghe
Abstract:
We present a new algorithm to train a robust malware detector. Modern malware detectors rely on machine learning algorithms. Now, the adversarial objective is to devise alterations to the malware code to decrease the chance of being detected whilst preserving the functionality and realism of the malware. Adversarial learning is effective in improving robustness but generating functional and realis…
▽ More
We present a new algorithm to train a robust malware detector. Modern malware detectors rely on machine learning algorithms. Now, the adversarial objective is to devise alterations to the malware code to decrease the chance of being detected whilst preserving the functionality and realism of the malware. Adversarial learning is effective in improving robustness but generating functional and realistic adversarial malware samples is non-trivial. Because: i) in contrast to tasks capable of using gradient-based feedback, adversarial learning in a domain without a differentiable mapping function from the problem space (malware code inputs) to the feature space is hard; and ii) it is difficult to ensure the adversarial malware is realistic and functional. This presents a challenge for developing scalable adversarial machine learning algorithms for large datasets at a production or commercial scale to realize robust malware detectors. We propose an alternative; perform adversarial learning in the feature space in contrast to the problem space. We prove the projection of perturbed, yet valid malware, in the problem space into feature space will always be a subset of adversarials generated in the feature space. Hence, by generating a robust network against feature-space adversarial examples, we inherently achieve robustness against problem-space adversarial examples. We formulate a Bayesian adversarial learning objective that captures the distribution of models for improved robustness. We prove that our learning method bounds the difference between the adversarial risk and empirical risk explaining the improved robustness. We show that adversarially trained BNNs achieve state-of-the-art robustness. Notably, adversarially trained BNNs are robust against stronger attacks with larger attack budgets by a margin of up to 15% on a recent production-scale malware dataset of more than 20 million samples.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Bayesian Learning with Information Gain Provably Bounds Risk for a Robust Adversarial Defense
Authors:
Bao Gia Doan,
Ehsan Abbasnejad,
Javen Qinfeng Shi,
Damith C. Ranasinghe
Abstract:
We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achiev…
▽ More
We present a new algorithm to learn a deep neural network model robust against adversarial attacks. Previous algorithms demonstrate an adversarially trained Bayesian Neural Network (BNN) provides improved robustness. We recognize the adversarial learning approach for approximating the multi-modal posterior distribution of a Bayesian model can lead to mode collapse; consequently, the model's achievements in robustness and performance are sub-optimal. Instead, we first propose preventing mode collapse to better approximate the multi-modal posterior distribution. Second, based on the intuition that a robust model should ignore perturbations and only consider the informative content of the input, we conceptualize and formulate an information gain objective to measure and force the information learned from both benign and adversarial training instances to be similar. Importantly. we prove and demonstrate that minimizing the information gain objective allows the adversarial risk to approach the conventional empirical risk. We believe our efforts provide a step toward a basis for a principled method of adversarially training BNNs. Our model demonstrate significantly improved robustness--up to 20%--compared with adversarial training and Adv-BNN under PGD attacks with 0.035 distortion on both CIFAR-10 and STL-10 datasets.
△ Less
Submitted 1 December, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Transferable Graph Backdoor Attack
Authors:
Shuiqiao Yang,
Bao Gia Doan,
Paul Montague,
Olivier De Vel,
Tamas Abraham,
Seyit Camtepe,
Damith C. Ranasinghe,
Salil S. Kanhere
Abstract:
Graph Neural Networks (GNNs) have achieved tremendous success in many graph mining tasks benefitting from the message passing strategy that fuses the local structure and node features for better graph representation learning. Despite the success of GNNs, and similar to other types of deep neural networks, GNNs are found to be vulnerable to unnoticeable perturbations on both graph structure and nod…
▽ More
Graph Neural Networks (GNNs) have achieved tremendous success in many graph mining tasks benefitting from the message passing strategy that fuses the local structure and node features for better graph representation learning. Despite the success of GNNs, and similar to other types of deep neural networks, GNNs are found to be vulnerable to unnoticeable perturbations on both graph structure and node features. Many adversarial attacks have been proposed to disclose the fragility of GNNs under different perturbation strategies to create adversarial examples. However, vulnerability of GNNs to successful backdoor attacks was only shown recently. In this paper, we disclose the TRAP attack, a Transferable GRAPh backdoor attack. The core attack principle is to poison the training dataset with perturbation-based triggers that can lead to an effective and transferable backdoor attack. The perturbation trigger for a graph is generated by performing the perturbation actions on the graph structure via a gradient based score matrix from a surrogate model. Compared with prior works, TRAP attack is different in several ways: i) it exploits a surrogate Graph Convolutional Network (GCN) model to generate perturbation triggers for a blackbox based backdoor attack; ii) it generates sample-specific perturbation triggers which do not have a fixed pattern; and iii) the attack transfers, for the first time in the context of GNNs, to different GNN models when trained with the forged poisoned training dataset. Through extensive evaluations on four real-world datasets, we demonstrate the effectiveness of the TRAP attack to build transferable backdoors in four different popular GNNs using four real-world datasets.
△ Less
Submitted 4 July, 2022; v1 submitted 21 June, 2022;
originally announced July 2022.
-
TnT Attacks! Universal Naturalistic Adversarial Patches Against Deep Neural Network Systems
Authors:
Bao Gia Doan,
Minhui Xue,
Shiqing Ma,
Ehsan Abbasnejad,
Damith C. Ranasinghe
Abstract:
Deep neural networks are vulnerable to attacks from adversarial inputs and, more recently, Trojans to misguide or hijack the model's decision. We expose the existence of an intriguing class of spatially bounded, physically realizable, adversarial examples -- Universal NaTuralistic adversarial paTches -- we call TnTs, by exploring the superset of the spatially bounded adversarial example space and…
▽ More
Deep neural networks are vulnerable to attacks from adversarial inputs and, more recently, Trojans to misguide or hijack the model's decision. We expose the existence of an intriguing class of spatially bounded, physically realizable, adversarial examples -- Universal NaTuralistic adversarial paTches -- we call TnTs, by exploring the superset of the spatially bounded adversarial example space and the natural input space within generative adversarial networks. Now, an adversary can arm themselves with a patch that is naturalistic, less malicious-looking, physically realizable, highly effective achieving high attack success rates, and universal. A TnT is universal because any input image captured with a TnT in the scene will: i) misguide a network (untargeted attack); or ii) force the network to make a malicious decision (targeted attack). Interestingly, now, an adversarial patch attacker has the potential to exert a greater level of control -- the ability to choose a location-independent, natural-looking patch as a trigger in contrast to being constrained to noisy perturbations -- an ability is thus far shown to be only possible with Trojan attack methods needing to interfere with the model building processes to embed a backdoor at the risk discovery; but, still realize a patch deployable in the physical world. Through extensive experiments on the large-scale visual classification task, ImageNet with evaluations across its entire validation set of 50,000 images, we demonstrate the realistic threat from TnTs and the robustness of the attack. We show a generalization of the attack to create patches achieving higher attack success rates than existing state-of-the-art methods. Our results show the generalizability of the attack to different visual classification tasks (CIFAR-10, GTSRB, PubFig) and multiple state-of-the-art deep neural networks such as WideResnet50, Inception-V3 and VGG-16.
△ Less
Submitted 25 July, 2022; v1 submitted 18 November, 2021;
originally announced November 2021.
-
Backdoor Attacks and Countermeasures on Deep Learning: A Comprehensive Review
Authors:
Yansong Gao,
Bao Gia Doan,
Zhi Zhang,
Siqi Ma,
Jiliang Zhang,
Anmin Fu,
Surya Nepal,
Hyoungshick Kim
Abstract:
This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-…
▽ More
This work provides the community with a timely comprehensive review of backdoor attacks and countermeasures on deep learning. According to the attacker's capability and affected stage of the machine learning pipeline, the attack surfaces are recognized to be wide and then formalized into six categorizations: code poisoning, outsourcing, pretrained, data collection, collaborative learning and post-deployment. Accordingly, attacks under each categorization are combed. The countermeasures are categorized into four general classes: blind backdoor removal, offline backdoor inspection, online backdoor inspection, and post backdoor removal. Accordingly, we review countermeasures, and compare and analyze their advantages and disadvantages. We have also reviewed the flip side of backdoor attacks, which are explored for i) protecting intellectual property of deep learning models, ii) acting as a honeypot to catch adversarial example attacks, and iii) verifying data deletion requested by the data contributor.Overall, the research on defense is far behind the attack, and there is no single defense that can prevent all types of backdoor attacks. In some cases, an attacker can intelligently bypass existing defenses with an adaptive attack. Drawing the insights from the systematic review, we also present key areas for future research on the backdoor, such as empirical security evaluations from physical trigger attacks, and in particular, more efficient and practical countermeasures are solicited.
△ Less
Submitted 2 August, 2020; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Design and Evaluation of a Multi-Domain Trojan Detection Method on Deep Neural Networks
Authors:
Yansong Gao,
Yeonjae Kim,
Bao Gia Doan,
Zhi Zhang,
Gongxuan Zhang,
Surya Nepal,
Damith C. Ranasinghe,
Hyoungshick Kim
Abstract:
This work corroborates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs, is a multi-domain Trojan detection defence across Vision, Text and Audio domains---thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is the first confirmed Trojan detection method that is demonstratively independent of both the task domain and model architectures. We have extensively e…
▽ More
This work corroborates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs, is a multi-domain Trojan detection defence across Vision, Text and Audio domains---thus termed as STRIP-ViTA. Specifically, STRIP-ViTA is the first confirmed Trojan detection method that is demonstratively independent of both the task domain and model architectures. We have extensively evaluated the performance of STRIP-ViTA over: i) CIFAR10 and GTSRB datasets using 2D CNNs, and a public third party Trojaned model for vision tasks; ii) IMDB and consumer complaint datasets using both LSTM and 1D CNNs for text tasks; and speech command dataset using both 1D CNNs and 2D CNNs for audio tasks. Experimental results based on 28 tested Trojaned models demonstrate that STRIP-ViTA performs well across all nine architectures and five datasets. In general, STRIP-ViTA can effectively detect Trojan inputs with small false acceptance rate (FAR) with an acceptable preset false rejection rate (FRR). In particular, for vision tasks, we can always achieve a 0% FRR and FAR. By setting FRR to be 3%, average FAR of 1.1% and 3.55% are achieved for text and audio tasks, respectively. Moreover, we have evaluated and shown the effectiveness of STRIP-ViTA against a number of advanced backdoor attacks whilst other state-of-the-art methods lose effectiveness in front of one or all of these advanced backdoor attacks.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems
Authors:
Bao Gia Doan,
Ehsan Abbasnejad,
Damith C. Ranasinghe
Abstract:
We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model's decision to a target prediction---a target determined by and only known to the attacker. Febru…
▽ More
We propose Februus; a new idea to neutralize highly potent and insidious Trojan attacks on Deep Neural Network (DNN) systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model's decision to a target prediction---a target determined by and only known to the attacker. Februus sanitizes the incoming input by surgically removing the potential trigger artifacts and restoring the input for the classification task. Februus enables effective Trojan mitigation by sanitizing inputs with no loss of performance for sanitized inputs, Trojaned or benign. Our extensive evaluations on multiple infected models based on four popular datasets across three contrasting vision applications and trigger types demonstrate the high efficacy of Februus. We dramatically reduced attack success rates from 100% to near 0% for all cases (achieving 0% on multiple cases) and evaluated the generalizability of Februus to defend against complex adaptive attacks; notably, we realized the first defense against the advanced partial Trojan attack. To the best of our knowledge, Februus is the first backdoor defense method for operation at run-time capable of sanitizing Trojaned inputs without requiring anomaly detection methods, model retraining or costly labeled data.
△ Less
Submitted 28 September, 2020; v1 submitted 9 August, 2019;
originally announced August 2019.