-
GenFighter: A Generative and Evolutive Textual Attack Removal
Authors:
Md Athikul Islam,
Edoardo Serra,
Sushil Jajodia
Abstract:
Adversarial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP). This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. GenFighter identifies potentially malicious instances deviating from the distrib…
▽ More
Adversarial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP). This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. GenFighter identifies potentially malicious instances deviating from the distribution, transforms them into semantically equivalent instances aligned with the training data, and employs ensemble techniques for a unified and robust response. By conducting extensive experiments, we show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics. Additionally, it requires a high number of queries per attack, making the attack more challenging in real scenarios. The ablation study shows that our approach integrates transfer learning, a generative/evolutive procedure, and an ensemble method, providing an effective defense against NLP adversarial attacks.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
An Empirical Study on the Membership Inference Attack against Tabular Data Synthesis Models
Authors:
Jihyeon Hyeong,
Jayoung Kim,
Noseong Park,
Sushil Jajodia
Abstract:
Tabular data typically contains private and important information; thus, precautions must be taken before they are shared with others. Although several methods (e.g., differential privacy and k-anonymity) have been proposed to prevent information leakage, in recent years, tabular data synthesis models have become popular because they can well trade-off between data utility and privacy. However, re…
▽ More
Tabular data typically contains private and important information; thus, precautions must be taken before they are shared with others. Although several methods (e.g., differential privacy and k-anonymity) have been proposed to prevent information leakage, in recent years, tabular data synthesis models have become popular because they can well trade-off between data utility and privacy. However, recent research has shown that generative models for image data are susceptible to the membership inference attack, which can determine whether a given record was used to train a victim synthesis model. In this paper, we investigate the membership inference attack in the context of tabular data synthesis. We conduct experiments on 4 state-of-the-art tabular data synthesis models under two attack scenarios (i.e., one black-box and one white-box attack), and find that the membership inference attack can seriously jeopardize these models. We next conduct experiments to evaluate how well two popular differentially-private deep learning training algorithms, DP-SGD and DP-GAN, can protect the models against the attack. Our key finding is that both algorithms can largely alleviate this threat by sacrificing the generation quality.
△ Less
Submitted 25 August, 2022; v1 submitted 17 August, 2022;
originally announced August 2022.
-
PatchRNN: A Deep Learning-Based System for Security Patch Identification
Authors:
Xinda Wang,
Shu Wang,
Pengbin Feng,
Kun Sun,
Sushil Jajodia,
Sanae Benchaaboun,
Frank Geck
Abstract:
With the increasing usage of open-source software (OSS) components, vulnerabilities embedded within them are propagated to a huge number of underlying applications. In practice, the timely application of security patches in downstream software is challenging. The main reason is that such patches do not explicitly indicate their security impacts in the documentation, which would be difficult to rec…
▽ More
With the increasing usage of open-source software (OSS) components, vulnerabilities embedded within them are propagated to a huge number of underlying applications. In practice, the timely application of security patches in downstream software is challenging. The main reason is that such patches do not explicitly indicate their security impacts in the documentation, which would be difficult to recognize for software maintainers and users. However, attackers can still identify these "secret" security patches by analyzing the source code and generate corresponding exploits to compromise not only unpatched versions of the current software, but also other similar software packages that may contain the same vulnerability due to code cloning or similar design/implementation logic. Therefore, it is critical to identify these secret security patches to enable timely fixes. To this end, we propose a deep learning-based defense system called PatchRNN to automatically identify secret security patches in OSS. Besides considering descriptive keywords in the commit message (i.e., at the text level), we leverage both syntactic and semantic features at the source-code level. To evaluate the performance of our system, we apply it on a large-scale real-world patch dataset and conduct a case study on a popular open-source web server software - NGINX. Experimental results show that the PatchRNN can successfully detect secret security patches with a low false positive rate.
△ Less
Submitted 5 January, 2023; v1 submitted 6 August, 2021;
originally announced August 2021.
-
Capture the Bot: Using Adversarial Examples to Improve CAPTCHA Robustness to Bot Attacks
Authors:
Dorjan Hitaj,
Briland Hitaj,
Sushil Jajodia,
Luigi V. Mancini
Abstract:
To this date, CAPTCHAs have served as the first line of defense preventing unauthorized access by (malicious) bots to web-based services, while at the same time maintaining a trouble-free experience for human visitors. However, recent work in the literature has provided evidence of sophisticated bots that make use of advancements in machine learning (ML) to easily bypass existing CAPTCHA-based def…
▽ More
To this date, CAPTCHAs have served as the first line of defense preventing unauthorized access by (malicious) bots to web-based services, while at the same time maintaining a trouble-free experience for human visitors. However, recent work in the literature has provided evidence of sophisticated bots that make use of advancements in machine learning (ML) to easily bypass existing CAPTCHA-based defenses. In this work, we take the first step to address this problem. We introduce CAPTURE, a novel CAPTCHA scheme based on adversarial examples. While typically adversarial examples are used to lead an ML model astray, with CAPTURE, we attempt to make a "good use" of such mechanisms. Our empirical evaluations show that CAPTURE can produce CAPTCHAs that are easy to solve by humans while at the same time, effectively thwarting ML-based bot solvers.
△ Less
Submitted 4 November, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Two Can Play That Game: An Adversarial Evaluation of a Cyber-alert Inspection System
Authors:
Ankit Shah,
Arunesh Sinha,
Rajesh Ganesan,
Sushil Jajodia,
Hasan Cam
Abstract:
Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which form the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations. A recent work, in collaboration with Army Research Lab, USA proposed…
▽ More
Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which form the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations. A recent work, in collaboration with Army Research Lab, USA proposed a reinforcement learning (RL) based approach to prevent the cyber-alert queue length from growing large and overwhelming the defender. Given the potential deployment of this approach to CSOCs run by US defense agencies, we perform a red team (adversarial) evaluation of this approach. Further, with the recent attacks on learning systems, it is even more important to test the limits of this RL approach. Towards that end, we learn an adversarial alert generation policy that is a best response to the defender inspection policy. Surprisingly, we find the defender policy to be quite robust to the best response of the attacker. In order to explain this observation, we extend the earlier RL model to a game model and show that there exists defender policies that can be robust against any adversarial policy. We also derive a competitive baseline from the game theory model and compare it to the RL approach. However, we go further to exploit assumptions made in the MDP in the RL model and discover an attacker policy that overwhelms the defender. We use a double oracle approach to retrain the defender with episodes from this discovered attacker policy. This made the defender robust to the discovered attacker policy and no further harmful attacker policies were discovered. Overall, the adversarial RL and double oracle approach in RL are general techniques that are applicable to other RL usage in adversarial environments.
△ Less
Submitted 13 October, 2018;
originally announced October 2018.
-
Data Synthesis based on Generative Adversarial Networks
Authors:
Noseong Park,
Mahmoud Mohammadi,
Kshitij Gorde,
Sushil Jajodia,
Hongkyu Park,
Youngmin Kim
Abstract:
Privacy is an important concern for our society where sharing data with partners or releasing data to the public is a frequent occurrence. Some of the techniques that are being used to achieve privacy are to remove identifiers, alter quasi-identifiers, and perturb values. Unfortunately, these approaches suffer from two limitations. First, it has been shown that private information can still be lea…
▽ More
Privacy is an important concern for our society where sharing data with partners or releasing data to the public is a frequent occurrence. Some of the techniques that are being used to achieve privacy are to remove identifiers, alter quasi-identifiers, and perturb values. Unfortunately, these approaches suffer from two limitations. First, it has been shown that private information can still be leaked if attackers possess some background knowledge or other information sources. Second, they do not take into account the adverse impact these methods will have on the utility of the released data. In this paper, we propose a method that meets both requirements. Our method, called table-GAN, uses generative adversarial networks (GANs) to synthesize fake tables that are statistically similar to the original table yet do not incur information leakage. We show that the machine learning models trained using our synthetic tables exhibit performance that is similar to that of models trained using the original table for unknown testing cases. We call this property model compatibility. We believe that anonymization/perturbation/synthesis methods without model compatibility are of little value. We used four real-world datasets from four different domains for our experiments and conducted in-depth comparisons with state-of-the-art anonymization, perturbation, and generation techniques. Throughout our experiments, only our method consistently shows a balance between privacy level and model compatibility.
△ Less
Submitted 2 July, 2018; v1 submitted 8 June, 2018;
originally announced June 2018.
-
On-the fly AES Decryption/Encryption for Cloud SQL Databases
Authors:
Sushil Jajodia,
Witold Litwin,
Thomas Schwarz
Abstract:
We propose the client-side AES256 encryption for a cloud SQL DB. A column ciphertext is deterministic or probabilistic. We trust the cloud DBMS for security of its run-time values, e.g., through a moving target defense. The client may send AES key(s) with the query. These serve the on-the-fly decryption of selected ciphertext into plaintext for query evaluation. The DBMS clears the key(s) and the…
▽ More
We propose the client-side AES256 encryption for a cloud SQL DB. A column ciphertext is deterministic or probabilistic. We trust the cloud DBMS for security of its run-time values, e.g., through a moving target defense. The client may send AES key(s) with the query. These serve the on-the-fly decryption of selected ciphertext into plaintext for query evaluation. The DBMS clears the key(s) and the plaintext at the query end at latest. It may deliver ciphertext to decryption enabled clients or plaintext otherwise, e.g., to browsers/navigators. The scheme functionally offers to a cloud DBMS capabilities of a plaintext SQL DBMS. AES processing overhead appears negligible for a modern CPU, e.g., a popular Intel I5. The determin-istic encryption may have no storage overhead. The probabilistic one doubles the DB storage. The scheme seems the first generally practical for an outsourced encrypted SQL DB. An implementation sufficient to practice with appears easy. An existing cloud SQL DBMS with UDF support should do.
△ Less
Submitted 20 December, 2015;
originally announced December 2015.
-
Privacy in geo-social networks: proximity notification with untrusted service providers and curious buddies
Authors:
Sergio Mascetti,
Dario Freni,
Claudio Bettini,
X. Sean Wang,
Sushil Jajodia
Abstract:
A major feature of the emerging geo-social networks is the ability to notify a user when one of his friends (also called buddies) happens to be geographically in proximity with the user. This proximity service is usually offered by the network itself or by a third party service provider (SP) using location data acquired from the users. This paper provides a rigorous theoretical and experimental an…
▽ More
A major feature of the emerging geo-social networks is the ability to notify a user when one of his friends (also called buddies) happens to be geographically in proximity with the user. This proximity service is usually offered by the network itself or by a third party service provider (SP) using location data acquired from the users. This paper provides a rigorous theoretical and experimental analysis of the existing solutions for the location privacy problem in proximity services. This is a serious problem for users who do not trust the SP to handle their location data, and would only like to release their location information in a generalized form to participating buddies. The paper presents two new protocols providing complete privacy with respect to the SP, and controllable privacy with respect to the buddies. The analytical and experimental analysis of the protocols takes into account privacy, service precision, and computation and communication costs, showing the superiority of the new protocols compared to those appeared in the literature to date. The proposed protocols have also been tested in a full system implementation of the proximity service.
△ Less
Submitted 6 November, 2010; v1 submitted 2 July, 2010;
originally announced July 2010.
-
The Role of Quasi-identifiers in k-Anonymity Revisited
Authors:
Claudio Bettini,
X. Sean Wang,
Sushil Jajodia
Abstract:
The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anony…
▽ More
The concept of k-anonymity, used in the recent literature to formally evaluate the privacy preservation of published tables, was introduced based on the notion of quasi-identifiers (or QI for short). The process of obtaining k-anonymity for a given private table is first to recognize the QIs in the table, and then to anonymize the QI values, the latter being called k-anonymization. While k-anonymization is usually rigorously validated by the authors, the definition of QI remains mostly informal, and different authors seem to have different interpretations of the concept of QI. The purpose of this paper is to provide a formal underpinning of QI and examine the correctness and incorrectness of various interpretations of QI in our formal framework. We observe that in cases where the concept has been used correctly, its application has been conservative; this note provides a formal understanding of the conservative nature in such cases.
△ Less
Submitted 8 November, 2006;
originally announced November 2006.