-
Single Word Change is All You Need: Designing Attacks and Defenses for Text Classifiers
Authors:
Lei Xu,
Sarah Alnegheimish,
Laure Berti-Equille,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in clas…
▽ More
In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric \r{ho} to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defense, which aims to improve \r{ho} by applying data augmentation in learning. Experimental results on 4 datasets and BERT and distilBERT classifiers show that SP-Defense improves \r{ho} by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Modeling glycemia in humans by means of Grammatical Evolution
Authors:
J. Ignacio Hidalgo,
J. Manuel Colmenar,
José L. Risco-Martín,
Alfredo Cuesta-Infante,
Esther Maqueda,
Marta Botella,
José Antonio Rubio
Abstract:
Diabetes mellitus is a disease that affects to hundreds of millions of people worldwide. Maintaining a good control of the disease is critical to avoid severe long-term complications. In recent years, several artificial pancreas systems have been proposed and developed, which are increasingly advanced. However there is still a lot of research to do. One of the main problems that arises in the (sem…
▽ More
Diabetes mellitus is a disease that affects to hundreds of millions of people worldwide. Maintaining a good control of the disease is critical to avoid severe long-term complications. In recent years, several artificial pancreas systems have been proposed and developed, which are increasingly advanced. However there is still a lot of research to do. One of the main problems that arises in the (semi) automatic control of diabetes, is to get a model explaining how glycemia (glucose levels in blood) varies with insulin, food intakes and other factors, fitting the characteristics of each individual or patient. This paper proposes the application of evolutionary computation techniques to obtain customized models of patients, unlike most of previous approaches which obtain averaged models. The proposal is based on a kind of genetic programming based on grammars known as Grammatical Evolution (GE). The proposal has been tested with in-silico patient data and results are clearly positive. We present also a study of four different grammars and five objective functions. In the test phase the models characterized the glucose with a mean percentage average error of 13.69\%, modeling well also both hyper and hypoglycemic situations.
△ Less
Submitted 27 April, 2023;
originally announced May 2023.
-
Boosting the 3D thermal-aware floorplanning problem through a master-worker parallel MOEA
Authors:
Ignacio Arnaldo,
Alfredo Cuesta-Infante,
J. Manuel Colmenar,
José L. Risco-Martín,
José L. Ayala
Abstract:
The increasing transistor scale integration poses, among others, the thermal-aware floorplanning problem; consisting of how to place the hardware components in order to reduce overheating by dissipation. Due to the huge amount of feasible floorplans, most of the solutions found in the literature include an evolutionary algorithm for, either partially or completely, carrying out the task of floorpl…
▽ More
The increasing transistor scale integration poses, among others, the thermal-aware floorplanning problem; consisting of how to place the hardware components in order to reduce overheating by dissipation. Due to the huge amount of feasible floorplans, most of the solutions found in the literature include an evolutionary algorithm for, either partially or completely, carrying out the task of floorplanning. Evolutionary algorithms usually have a bottleneck in the fitness evaluation. In the problem of thermal-aware floorplanning, the layout evaluation by the thermal model takes 99.5\% of the computational time for the best floorplanning algorithm proposed so far.The contribution of this paper is to present a parallelization of this evaluation phase in a master$-$worker model to achieve a dramatic speed-up of the thermal-aware floorplanning process. Exhaustive experimentation was done over three dimensional integrated circuits, with 48 and 128 cores, outperforming previous published works.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
R&R: Metric-guided Adversarial Sentence Generation
Authors:
Lei Xu,
Alfredo Cuesta-Infante,
Laure Berti-Equille,
Kalyan Veeramachaneni
Abstract:
Adversarial examples are helpful for analyzing and improving the robustness of text classifiers. Generating high-quality adversarial examples is a challenging task as it requires generating fluent adversarial sentences that are semantically similar to the original sentences and preserve the original labels, while causing the classifier to misclassify them. Existing methods prioritize misclassifica…
▽ More
Adversarial examples are helpful for analyzing and improving the robustness of text classifiers. Generating high-quality adversarial examples is a challenging task as it requires generating fluent adversarial sentences that are semantically similar to the original sentences and preserve the original labels, while causing the classifier to misclassify them. Existing methods prioritize misclassification by maximizing each perturbation's effectiveness at misleading a text classifier; thus, the generated adversarial examples fall short in terms of fluency and similarity. In this paper, we propose a rewrite and rollback (R&R) framework for adversarial attack. It improves the quality of adversarial examples by optimizing a critique score which combines the fluency, similarity, and misclassification metrics. R&R generates high-quality adversarial examples by allowing exploration of perturbations that do not have immediate impact on the misclassification metric but can improve fluency and similarity metrics. We evaluate our method on 5 representative datasets and 3 classifier architectures. Our method outperforms current state-of-the-art in attack success rate by +16.2%, +12.8%, and +14.0% on the classifiers respectively. Code is available at https://github.com/DAI-Lab/fibber
△ Less
Submitted 19 October, 2022; v1 submitted 17 April, 2021;
originally announced April 2021.
-
TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks
Authors:
Alexander Geiger,
Dongyu Liu,
Sarah Alnegheimish,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
Time series anomalies can offer information relevant to critical situations facing various fields, from finance and aerospace to the IT, security, and medical domains. However, detecting anomalies in time series data is particularly challenging due to the vague definition of anomalies and said data's frequent lack of labels and highly complex temporal correlations. Current state-of-the-art unsuper…
▽ More
Time series anomalies can offer information relevant to critical situations facing various fields, from finance and aerospace to the IT, security, and medical domains. However, detecting anomalies in time series data is particularly challenging due to the vague definition of anomalies and said data's frequent lack of labels and highly complex temporal correlations. Current state-of-the-art unsupervised machine learning methods for anomaly detection suffer from scalability and portability issues, and may have high false positive rates. In this paper, we propose TadGAN, an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs). To capture the temporal correlations of time series distributions, we use LSTM Recurrent Neural Networks as base models for Generators and Critics. TadGAN is trained with cycle consistency loss to allow for effective time-series data reconstruction. We further propose several novel methods to compute reconstruction errors, as well as different approaches to combine reconstruction errors and Critic outputs to compute anomaly scores. To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one. We compare our approach to 8 baseline anomaly detection methods on 11 datasets from multiple reputable sources such as NASA, Yahoo, Numenta, Amazon, and Twitter. The results show that our approach can effectively detect anomalies and outperform baseline methods in most cases (6 out of 11). Notably, our method has the highest averaged F1 score across all the datasets. Our code is open source and is available as a benchmarking tool.
△ Less
Submitted 14 November, 2020; v1 submitted 16 September, 2020;
originally announced September 2020.
-
Robust Invisible Video Watermarking with Attention
Authors:
Kevin Alex Zhang,
Lei Xu,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
The goal of video watermarking is to embed a message within a video file in a way such that it minimally impacts the viewing experience but can be recovered even if the video is redistributed and modified, allowing media producers to assert ownership over their content. This paper presents RivaGAN, a novel architecture for robust video watermarking which features a custom attention-based mechanism…
▽ More
The goal of video watermarking is to embed a message within a video file in a way such that it minimally impacts the viewing experience but can be recovered even if the video is redistributed and modified, allowing media producers to assert ownership over their content. This paper presents RivaGAN, a novel architecture for robust video watermarking which features a custom attention-based mechanism for embedding arbitrary data as well as two independent adversarial networks which critique the video quality and optimize for robustness. Using this technique, we are able to achieve state-of-the-art results in deep learning-based video watermarking and produce watermarked videos which have minimal visual distortion and are robust against common video processing operations.
△ Less
Submitted 3 September, 2019;
originally announced September 2019.
-
Towards Reducing Biases in Combining Multiple Experts Online
Authors:
Yi Sun,
Ivan Ramirez,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
In many real life situations, including job and loan applications, gatekeepers must make justified and fair real-time decisions about a person's fitness for a particular opportunity. In this paper, we aim to accomplish approximate group fairness in an online stochastic decision-making process, where the fairness metric we consider is equalized odds. Our work follows from the classical learning-fro…
▽ More
In many real life situations, including job and loan applications, gatekeepers must make justified and fair real-time decisions about a person's fitness for a particular opportunity. In this paper, we aim to accomplish approximate group fairness in an online stochastic decision-making process, where the fairness metric we consider is equalized odds. Our work follows from the classical learning-from-experts scheme, assuming a finite set of classifiers (human experts, rules, options, etc) that cannot be modified. We run separate instances of the algorithm for each label class as well as sensitive groups, where the probability of choosing each instance is optimized for both fairness and regret. Our theoretical results show that approximately equalized odds can be achieved without sacrificing much regret. We also demonstrate the performance of the algorithm on real data sets commonly used by the fairness community.
△ Less
Submitted 24 May, 2021; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Modeling Tabular data using Conditional GAN
Authors:
Lei Xu,
Maria Skoularidou,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this…
▽ More
Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.
△ Less
Submitted 27 October, 2019; v1 submitted 30 June, 2019;
originally announced July 2019.
-
SteganoGAN: High Capacity Image Steganography with GANs
Authors:
Kevin Alex Zhang,
Alfredo Cuesta-Infante,
Lei Xu,
Kalyan Veeramachaneni
Abstract:
Image steganography is a procedure for hiding messages inside pictures. While other techniques such as cryptography aim to prevent adversaries from reading the secret message, steganography aims to hide the presence of the message itself. In this paper, we propose a novel technique for hiding arbitrary binary data in images using generative adversarial networks which allow us to optimize the perce…
▽ More
Image steganography is a procedure for hiding messages inside pictures. While other techniques such as cryptography aim to prevent adversaries from reading the secret message, steganography aims to hide the presence of the message itself. In this paper, we propose a novel technique for hiding arbitrary binary data in images using generative adversarial networks which allow us to optimize the perceptual quality of the images produced by our model. We show that our approach achieves state-of-the-art payloads of 4.4 bits per pixel, evades detection by steganalysis tools, and is effective on images from multiple datasets. To enable fair comparisons, we have released an open source library that is available online at https://github.com/DAI-Lab/SteganoGAN.
△ Less
Submitted 29 January, 2019; v1 submitted 12 January, 2019;
originally announced January 2019.
-
Learning Vine Copula Models For Synthetic Data Generation
Authors:
Yi Sun,
Alfredo Cuesta-Infante,
Kalyan Veeramachaneni
Abstract:
A vine copula model is a flexible high-dimensional dependence model which uses only bivariate building blocks. However, the number of possible configurations of a vine copula grows exponentially as the number of variables increases, making model selection a major challenge in development. In this work, we formulate a vine structure learning problem with both vector and reinforcement learning repre…
▽ More
A vine copula model is a flexible high-dimensional dependence model which uses only bivariate building blocks. However, the number of possible configurations of a vine copula grows exponentially as the number of variables increases, making model selection a major challenge in development. In this work, we formulate a vine structure learning problem with both vector and reinforcement learning representation. We use neural network to find the embeddings for the best possible vine model and generate a structure. Throughout experiments on synthetic and real-world datasets, we show that our proposed approach fits the data better in terms of log-likelihood. Moreover, we demonstrate that the model is able to generate high-quality samples in a variety of applications, making it a good candidate for synthetic data generation.
△ Less
Submitted 4 December, 2018;
originally announced December 2018.