-
Creating High Resolution Images with a Latent Adversarial Generator
Authors:
David Berthelot,
Peyman Milanfar,
Ian Goodfellow
Abstract:
Generating realistic images is difficult, and many formulations for this task have been proposed recently. If we restrict the task to that of generating a particular class of images, however, the task becomes more tractable. That is to say, instead of generating an arbitrary image as a sample from the manifold of natural images, we propose to sample images from a particular "subspace" of natural i…
▽ More
Generating realistic images is difficult, and many formulations for this task have been proposed recently. If we restrict the task to that of generating a particular class of images, however, the task becomes more tractable. That is to say, instead of generating an arbitrary image as a sample from the manifold of natural images, we propose to sample images from a particular "subspace" of natural images, directed by a low-resolution image from the same subspace. The problem we address, while close to the formulation of the single-image super-resolution problem, is in fact rather different. Single image super-resolution is the task of predicting the image closest to the ground truth from a relatively low resolution image. We propose to produce samples of high resolution images given extremely small inputs with a new method called Latent Adversarial Generator (LAG). In our generative sampling framework, we only use the input (possibly of very low-resolution) to direct what class of samples the network should produce. As such, the output of our algorithm is not a unique image that relates to the input, but rather a possible se} of related images sampled from the manifold of natural images. Our method learns exclusively in the latent space of the adversary using perceptual loss -- it does not have a pixel loss.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
MixMatch: A Holistic Approach to Semi-Supervised Learning
Authors:
David Berthelot,
Nicholas Carlini,
Ian Goodfellow,
Nicolas Papernot,
Avital Oliver,
Colin Raffel
Abstract:
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We…
▽ More
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success.
△ Less
Submitted 23 October, 2019; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Authors:
Yao Qin,
Nicholas Carlini,
Ian Goodfellow,
Garrison Cottrell,
Colin Raffel
Abstract:
Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output. So far, adversarial examples have been studied most extensively in the image domain. In this domain, adversarial examples can be constructed by imperceptibly modifying images to cause misclassification, and are practical in the physical world. In contrast, current targeted adversarial…
▽ More
Adversarial examples are inputs to machine learning models designed by an adversary to cause an incorrect output. So far, adversarial examples have been studied most extensively in the image domain. In this domain, adversarial examples can be constructed by imperceptibly modifying images to cause misclassification, and are practical in the physical world. In contrast, current targeted adversarial examples applied to speech recognition systems have neither of these properties: humans can easily identify the adversarial perturbations, and they are not effective when played over-the-air. This paper makes advances on both of these fronts. First, we develop effectively imperceptible audio adversarial examples (verified through a human study) by leveraging the psychoacoustic principle of auditory masking, while retaining 100% targeted success rate on arbitrary full-sentence targets. Next, we make progress towards physical-world over-the-air audio adversarial examples by constructing perturbations which remain effective even after applying realistic simulated environmental distortions.
△ Less
Submitted 7 June, 2019; v1 submitted 22 March, 2019;
originally announced March 2019.
-
A Research Agenda: Dynamic Models to Defend Against Correlated Attacks
Authors:
Ian Goodfellow
Abstract:
In this article I describe a research agenda for securing machine learning models against adversarial inputs at test time. This article does not present results but instead shares some of my thoughts about where I think that the field needs to go. Modern machine learning works very well on I.I.D. data: data for which each example is drawn {\em independently} and for which the distribution generati…
▽ More
In this article I describe a research agenda for securing machine learning models against adversarial inputs at test time. This article does not present results but instead shares some of my thoughts about where I think that the field needs to go. Modern machine learning works very well on I.I.D. data: data for which each example is drawn {\em independently} and for which the distribution generating each example is {\em identical}. When these assumptions are relaxed, modern machine learning can perform very poorly. When machine learning is used in contexts where security is a concern, it is desirable to design models that perform well even when the input is designed by a malicious adversary. So far most research in this direction has focused on an adversary who violates the {\em identical} assumption, and imposes some kind of restricted worst-case distribution shift. I argue that machine learning security researchers should also address the problem of relaxing the {\em independence} assumption and that current strategies designed for robustness to distribution shift will not do so. I recommend {\em dynamic models} that change each time they are run as a potential solution path to this problem, and show an example of a simple attack using correlated data that can be mitigated by a simple dynamic defense. This is not intended as a real-world security measure, but as a recommendation to explore this research direction and develop more realistic defenses.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
On Evaluating Adversarial Robustness
Authors:
Nicholas Carlini,
Anish Athalye,
Nicolas Papernot,
Wieland Brendel,
Jonas Rauber,
Dimitris Tsipras,
Ian Goodfellow,
Aleksander Madry,
Alexey Kurakin
Abstract:
Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose defenses are quickly shown to be incorrect.
We believe a large contributing factor is the difficulty of performing security evaluations. In this pa…
▽ More
Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose defenses are quickly shown to be incorrect.
We believe a large contributing factor is the difficulty of performing security evaluations. In this paper, we discuss the methodological foundations, review commonly accepted best practices, and suggest new methods for evaluating defenses to adversarial examples. We hope that both researchers developing defenses as well as readers and reviewers who wish to understand the completeness of an evaluation consider our advice in order to avoid common pitfalls.
△ Less
Submitted 20 February, 2019; v1 submitted 18 February, 2019;
originally announced February 2019.
-
A domain agnostic measure for monitoring and evaluating GANs
Authors:
Paulina Grnarova,
Kfir Y Levy,
Aurelien Lucchi,
Nathanael Perraudin,
Ian Goodfellow,
Thomas Hofmann,
Andreas Krause
Abstract:
Generative Adversarial Networks (GANs) have shown remarkable results in modeling complex distributions, but their evaluation remains an unsettled issue. Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training. The latter cannot be determined by simply inspecting the generator and discriminator loss curves as…
▽ More
Generative Adversarial Networks (GANs) have shown remarkable results in modeling complex distributions, but their evaluation remains an unsettled issue. Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training. The latter cannot be determined by simply inspecting the generator and discriminator loss curves as they behave non-intuitively. We leverage the notion of duality gap from game theory to propose a measure that addresses both (i) and (ii) at a low computational cost. Extensive experiments show the effectiveness of this measure to rank different GAN models and capture the typical GAN failure scenarios, including mode collapse and non-convergent behaviours. This evaluation metric also provides meaningful monitoring on the progression of the loss during training. It highly correlates with FID on natural image datasets, and with domain specific scores for text, sound and cosmology data where FID is not directly suitable. In particular, our proposed metric requires no labels or a pretrained classifier, making it domain agnostic.
△ Less
Submitted 15 July, 2020; v1 submitted 13 November, 2018;
originally announced November 2018.
-
Discriminator Rejection Sampling
Authors:
Samaneh Azadi,
Catherine Olsson,
Trevor Darrell,
Ian Goodfellow,
Augustus Odena
Abstract:
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be us…
▽ More
We propose a rejection sampling scheme using the discriminator of a GAN to approximately correct errors in the GAN generator distribution. We show that under quite strict assumptions, this will allow us to recover the data distribution exactly. We then examine where those strict assumptions break down and design a practical algorithm - called Discriminator Rejection Sampling (DRS) - that can be used on real data-sets. Finally, we demonstrate the efficacy of DRS on a mixture of Gaussians and on the SAGAN model, state-of-the-art in the image generation task at the time of developing this work. On ImageNet, we train an improved baseline that increases the Inception Score from 52.52 to 62.36 and reduces the Frechet Inception Distance from 18.65 to 14.79. We then use DRS to further improve on this baseline, improving the Inception Score to 76.08 and the FID to 13.75.
△ Less
Submitted 26 February, 2019; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values
Authors:
Julius Adebayo,
Justin Gilmer,
Ian Goodfellow,
Been Kim
Abstract:
Explaining the output of a complicated machine learning model like a deep neural network (DNN) is a central challenge in machine learning. Several proposed local explanation methods address this issue by identifying what dimensions of a single input are most responsible for a DNN's output. The goal of this work is to assess the sensitivity of local explanations to DNN parameter values. Somewhat su…
▽ More
Explaining the output of a complicated machine learning model like a deep neural network (DNN) is a central challenge in machine learning. Several proposed local explanation methods address this issue by identifying what dimensions of a single input are most responsible for a DNN's output. The goal of this work is to assess the sensitivity of local explanations to DNN parameter values. Somewhat surprisingly, we find that DNNs with randomly-initialized weights produce explanations that are both visually and quantitatively similar to those produced by DNNs with learned weights. Our conjecture is that this phenomenon occurs because these explanations are dominated by the lower level features of a DNN, and that a DNN's architecture provides a strong prior which significantly affects the representations learned at these lower layers. NOTE: This work is now subsumed by our recent manuscript, Sanity Checks for Saliency Maps (to appear NIPS 2018), where we expand on findings and address concerns raised in Sundararajan et. al. (2018).
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Sanity Checks for Saliency Maps
Authors:
Julius Adebayo,
Justin Gilmer,
Michael Muelly,
Ian Goodfellow,
Moritz Hardt,
Been Kim
Abstract:
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual ass…
▽ More
Saliency methods have emerged as a popular tool to highlight features in an input deemed relevant for the prediction of a learned model. Several saliency methods have been proposed, often guided by visual appeal on image data. In this work, we propose an actionable methodology to evaluate what kinds of explanations a given method can and cannot provide. We find that reliance, solely, on visual assessment can be misleading. Through extensive experiments we show that some existing saliency methods are independent both of the model and of the data generating process. Consequently, methods that fail the proposed tests are inadequate for tasks that are sensitive to either data or model, such as, finding outliers in the data, explaining the relationship between inputs and outputs that the model learned, and debugging the model. We interpret our findings through an analogy with edge detection in images, a technique that requires neither training data nor model. Theory in the case of a linear model and a single-layer convolutional neural network supports our experimental findings.
△ Less
Submitted 6 November, 2020; v1 submitted 8 October, 2018;
originally announced October 2018.
-
Unrestricted Adversarial Examples
Authors:
Tom B. Brown,
Nicholas Carlini,
Chiyuan Zhang,
Catherine Olsson,
Paul Christiano,
Ian Goodfellow
Abstract:
We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool. Unlike most prior work in ML robustness, which studies norm-constrained adversaries, we shift our focus to unconstrained adversaries. Defenders submit machine learning models, and try to achieve high accuracy and coverage on non-adversarial data while making no confident…
▽ More
We introduce a two-player contest for evaluating the safety and robustness of machine learning systems, with a large prize pool. Unlike most prior work in ML robustness, which studies norm-constrained adversaries, we shift our focus to unconstrained adversaries. Defenders submit machine learning models, and try to achieve high accuracy and coverage on non-adversarial data while making no confident mistakes on adversarial inputs. Attackers try to subvert defenses by finding arbitrary unambiguous inputs where the model assigns an incorrect label with high confidence. We propose a simple unambiguous dataset ("bird-or- bicycle") to use as part of this contest. We hope this contest will help to more comprehensively evaluate the worst-case adversarial risk of machine learning models.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
Skill Rating for Generative Models
Authors:
Catherine Olsson,
Surya Bhupatiraju,
Tom Brown,
Augustus Odena,
Ian Goodfellow
Abstract:
We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and discriminators provide an effective way to evaluate generative models. We introduce two methods for summarizing tournament outcomes: tournament win rate and skill rating. Evaluations are useful in different cont…
▽ More
We explore a new way to evaluate generative models using insights from evaluation of competitive games between human players. We show experimentally that tournaments between generators and discriminators provide an effective way to evaluate generative models. We introduce two methods for summarizing tournament outcomes: tournament win rate and skill rating. Evaluations are useful in different contexts, including monitoring the progress of a single model as it learns during the training process, and comparing the capabilities of two different fully trained models. We show that a tournament consisting of a single model playing against past and future versions of itself produces a useful measure of training progress. A tournament containing multiple separate models (using different seeds, hyperparameters, and architectures) provides a useful relative comparison between different trained GANs. Tournament-based rating methods are conceptually distinct from numerous previous categories of approaches to evaluation of generative models, and have complementary advantages and disadvantages.
△ Less
Submitted 14 August, 2018;
originally announced August 2018.
-
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing
Authors:
Augustus Odena,
Ian Goodfellow
Abstract:
Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs…
▽ More
Machine learning models are notoriously difficult to interpret and debug. This is particularly true of neural networks. In this work, we introduce automated software testing techniques for neural networks that are well-suited to discovering errors which occur only for rare inputs. Specifically, we develop coverage-guided fuzzing (CGF) methods for neural networks. In CGF, random mutations of inputs to a neural network are guided by a coverage metric toward the goal of satisfying user-specified constraints. We describe how fast approximate nearest neighbor algorithms can provide this coverage metric. We then discuss the application of CGF to the following goals: finding numerical errors in trained neural networks, generating disagreements between neural networks and quantized versions of those networks, and surfacing undesirable behavior in character level language models. Finally, we release an open source library called TensorFuzz that implements the described techniques.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer
Authors:
David Berthelot,
Colin Raffel,
Aurko Roy,
Ian Goodfellow
Abstract:
Autoencoders provide a powerful framework for learning compressed representations by encoding all of the information needed to reconstruct a data point in a latent code. In some cases, autoencoders can "interpolate": By decoding the convex combination of the latent codes for two datapoints, the autoencoder can produce an output which semantically mixes characteristics from the datapoints. In this…
▽ More
Autoencoders provide a powerful framework for learning compressed representations by encoding all of the information needed to reconstruct a data point in a latent code. In some cases, autoencoders can "interpolate": By decoding the convex combination of the latent codes for two datapoints, the autoencoder can produce an output which semantically mixes characteristics from the datapoints. In this paper, we propose a regularization procedure which encourages interpolated outputs to appear more realistic by fooling a critic network which has been trained to recover the mixing coefficient from interpolated data. We then develop a simple benchmark task where we can quantitatively measure the extent to which various autoencoders can interpolate and show that our regularizer dramatically improves interpolation in this setting. We also demonstrate empirically that our regularizer produces latent codes which are more effective on downstream tasks, suggesting a possible link between interpolation abilities and learning useful representations.
△ Less
Submitted 23 July, 2018; v1 submitted 19 July, 2018;
originally announced July 2018.
-
Motivating the Rules of the Game for Adversarial Example Research
Authors:
Justin Gilmer,
Ryan P. Adams,
Ian Goodfellow,
David Andersen,
George E. Dahl
Abstract:
Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such ex…
▽ More
Advances in machine learning have led to broad deployment of systems with impressive performance on important problems. Nonetheless, these systems can be induced to make errors on data that are surprisingly similar to examples the learned system handles correctly. The existence of these errors raises a variety of questions about out-of-sample generalization and whether bad actors might use such examples to abuse deployed systems. As a result of these security concerns, there has been a flurry of recent papers proposing algorithms to defend against such malicious perturbations of correctly handled examples. It is unclear how such misclassifications represent a different kind of security problem than other errors, or even other attacker-produced examples that have no specific relationship to an uncorrupted input. In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security. Towards this end, we establish a taxonomy of motivations, constraints, and abilities for more plausible adversaries. Finally, we provide a series of recommendations outlining a path forward for future work to more clearly articulate the threat model and perform more meaningful evaluation.
△ Less
Submitted 19 July, 2018; v1 submitted 17 July, 2018;
originally announced July 2018.
-
Adversarial Reprogramming of Neural Networks
Authors:
Gamaleldin F. Elsayed,
Ian Goodfellow,
Jascha Sohl-Dickstein
Abstract:
Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduc…
▽ More
Deep neural networks are susceptible to \emph{adversarial} attacks. In computer vision, well-crafted perturbations to images can cause neural networks to make mistakes such as confusing a cat with a computer. Previous adversarial attacks have been designed to degrade performance of models or cause machine learning models to produce specific outputs chosen ahead of time by the attacker. We introduce attacks that instead {\em reprogram} the target model to perform a task chosen by the attacker---without the attacker needing to specify or compute the desired output for each test-time input. This attack finds a single adversarial perturbation, that can be added to all test-time inputs to a machine learning model in order to cause the model to perform a task chosen by the adversary---even if the model was not trained to do this task. These perturbations can thus be considered a program for the new task. We demonstrate adversarial reprogramming on six ImageNet classification models, repurposing these models to perform a counting task, as well as classification tasks: classification of MNIST and CIFAR-10 examples presented as inputs to the ImageNet model.
△ Less
Submitted 29 November, 2018; v1 submitted 28 June, 2018;
originally announced June 2018.
-
Defense Against the Dark Arts: An overview of adversarial example security research and future research directions
Authors:
Ian Goodfellow
Abstract:
This article presents a summary of a keynote lecture at the Deep Learning Security workshop at IEEE Security and Privacy 2018. This lecture summarizes the state of the art in defenses against adversarial examples and provides recommendations for future research directions on this topic.
This article presents a summary of a keynote lecture at the Deep Learning Security workshop at IEEE Security and Privacy 2018. This lecture summarizes the state of the art in defenses against adversarial examples and provides recommendations for future research directions on this topic.
△ Less
Submitted 11 June, 2018;
originally announced June 2018.
-
Self-Attention Generative Adversarial Networks
Authors:
Han Zhang,
Ian Goodfellow,
Dimitris Metaxas,
Augustus Odena
Abstract:
In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover,…
▽ More
In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.
△ Less
Submitted 14 June, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
Authors:
Avital Oliver,
Augustus Odena,
Colin Raffel,
Ekin D. Cubuk,
Ian J. Goodfellow
Abstract:
Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified r…
▽ More
Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that these algorithms would face in real-world applications. After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. We find that the performance of simple baselines which do not use unlabeled data is often underreported, that SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and that performance can degrade substantially when the unlabeled dataset contains out-of-class examples. To help guide SSL research towards real-world applicability, we make our unified reimplemention and evaluation platform publicly available.
△ Less
Submitted 17 June, 2019; v1 submitted 24 April, 2018;
originally announced April 2018.
-
Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size
Authors:
Ian Goodfellow
Abstract:
A key problem in research on adversarial examples is that vulnerability to adversarial examples is usually measured by running attack algorithms. Because the attack algorithms are not optimal, the attack algorithms are prone to overestimating the size of perturbation needed to fool the target model. In other words, the attack-based methodology provides an upper-bound on the size of a perturbation…
▽ More
A key problem in research on adversarial examples is that vulnerability to adversarial examples is usually measured by running attack algorithms. Because the attack algorithms are not optimal, the attack algorithms are prone to overestimating the size of perturbation needed to fool the target model. In other words, the attack-based methodology provides an upper-bound on the size of a perturbation that will fool the model, but security guarantees require a lower bound. CLEVER is a proposed scoring method to estimate a lower bound. Unfortunately, an estimate of a bound is not a bound. In this report, we show that gradient masking, a common problem that causes attack methodologies to provide only a very loose upper bound, causes CLEVER to overestimate the size of perturbation needed to fool the model. In other words, CLEVER does not resolve the key problem with the attack-based methodology, because it fails to provide a lower bound.
△ Less
Submitted 20 April, 2018;
originally announced April 2018.
-
Adversarial Attacks and Defences Competition
Authors:
Alexey Kurakin,
Ian Goodfellow,
Samy Bengio,
Yinpeng Dong,
Fangzhou Liao,
Ming Liang,
Tianyu Pang,
Jun Zhu,
Xiaolin Hu,
Cihang Xie,
Jianyu Wang,
Zhishuai Zhang,
Zhou Ren,
Alan Yuille,
Sangxia Huang,
Yao Zhao,
Yuzhe Zhao,
Zhonglin Han,
Junjiajia Long,
Yerkebulan Berdibekov,
Takuya Akiba,
Seiya Tokui,
Motoki Abe
Abstract:
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several o…
▽ More
To accelerate research on adversarial examples and robustness of machine learning classifiers, Google Brain organized a NIPS 2017 competition that encouraged researchers to develop new methods to generate adversarial examples as well as to develop new ways to defend against them. In this chapter, we describe the structure and organization of the competition and the solutions developed by several of the top-placing teams.
△ Less
Submitted 30 March, 2018;
originally announced April 2018.
-
Adversarial Logit Pairing
Authors:
Harini Kannan,
Alexey Kurakin,
Ian Goodfellow
Abstract:
In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit…
▽ More
In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting - an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on ImageNet against PGD white box attacks, with an accuracy improvement from 1.5% to 27.9%. Adversarial logit pairing also successfully damages the current state of the art defense against black box attacks on ImageNet (Tramer et al., 2018), dropping its accuracy from 66.6% to 47.1%. With this new accuracy drop, adversarial logit pairing ties with Tramer et al.(2018) for the state of the art on black box attacks on ImageNet.
△ Less
Submitted 16 March, 2018;
originally announced March 2018.
-
Is Generator Conditioning Causally Related to GAN Performance?
Authors:
Augustus Odena,
Jacob Buckman,
Catherine Olsson,
Tom B. Brown,
Christopher Olah,
Colin Raffel,
Ian Goodfellow
Abstract:
Recent work (Pennington et al, 2017) suggests that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Motivated by this, we study the distribution of singular values of the Jacobian of the generator in Generative Adversarial Networks (GANs). We find that this Jacobian generally becomes ill-conditioned at the beginning of training.…
▽ More
Recent work (Pennington et al, 2017) suggests that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Motivated by this, we study the distribution of singular values of the Jacobian of the generator in Generative Adversarial Networks (GANs). We find that this Jacobian generally becomes ill-conditioned at the beginning of training. Moreover, we find that the average (with z from p(z)) conditioning of the generator is highly predictive of two other ad-hoc metrics for measuring the 'quality' of trained GANs: the Inception Score and the Frechet Inception Distance (FID). We test the hypothesis that this relationship is causal by proposing a 'regularization' technique (called Jacobian Clamping) that softly penalizes the condition number of the generator Jacobian. Jacobian Clamping improves the mean Inception Score and the mean FID for GANs trained on several datasets. It also greatly reduces inter-run variance of the aforementioned scores, addressing (at least partially) one of the main criticisms of GANs.
△ Less
Submitted 18 June, 2018; v1 submitted 23 February, 2018;
originally announced February 2018.
-
Adversarial Examples that Fool both Computer Vision and Time-Limited Humans
Authors:
Gamaleldin F. Elsayed,
Shreya Shankar,
Brian Cheung,
Nicolas Papernot,
Alex Kurakin,
Ian Goodfellow,
Jascha Sohl-Dickstein
Abstract:
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with…
▽ More
Machine learning models are vulnerable to adversarial examples: small changes to images can cause computer vision models to make mistakes such as identifying a school bus as an ostrich. However, it is still an open question whether humans are prone to similar mistakes. Here, we address this question by leveraging recent techniques that transfer adversarial examples from computer vision models with known parameters and architecture to other models with unknown parameters and architecture, and by matching the initial processing of the human visual system. We find that adversarial examples that strongly transfer across computer vision models influence the classifications made by time-limited human observers.
△ Less
Submitted 21 May, 2018; v1 submitted 22 February, 2018;
originally announced February 2018.
-
MaskGAN: Better Text Generation via Filling in the______
Authors:
William Fedus,
Ian Goodfellow,
Andrew M. Dai
Abstract:
Neural text generation models are often autoregressive language models or seq2seq models. These models generate text by sampling words sequentially, with each word conditioned on the previous word, and are state-of-the-art for several machine translation and summarization benchmarks. These benchmarks are often defined by validation perplexity even though this is not a direct measure of the quality…
▽ More
Neural text generation models are often autoregressive language models or seq2seq models. These models generate text by sampling words sequentially, with each word conditioned on the previous word, and are state-of-the-art for several machine translation and summarization benchmarks. These benchmarks are often defined by validation perplexity even though this is not a direct measure of the quality of the generated text. Additionally, these models are typically trained via maxi- mum likelihood and teacher forcing. These methods are well-suited to optimizing perplexity but can result in poor sample quality since generating text requires conditioning on sequences of words that may have never been observed at training time. We propose to improve sample quality using Generative Adversarial Networks (GANs), which explicitly train the generator to produce high quality samples and have shown a lot of success in image generation. GANs were originally designed to output differentiable values, so discrete language generation is challenging for them. We claim that validation perplexity alone is not indicative of the quality of text generated by a model. We introduce an actor-critic conditional GAN that fills in missing text conditioned on the surrounding context. We show qualitatively and quantitatively, evidence that this produces more realistic conditional and unconditional text samples compared to a maximum likelihood trained model.
△ Less
Submitted 1 March, 2018; v1 submitted 23 January, 2018;
originally announced January 2018.
-
Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step
Authors:
William Fedus,
Mihaela Rosca,
Balaji Lakshminarayanan,
Andrew M. Dai,
Shakir Mohamed,
Ian Goodfellow
Abstract:
Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each playe…
▽ More
Generative adversarial networks (GANs) are a family of generative models that do not minimize a single training criterion. Unlike other generative models, the data distribution is learned via a game between a generator (the generative model) and a discriminator (a teacher providing training signal) that each minimize their own cost. GANs are designed to reach a Nash equilibrium at which each player cannot reduce their cost without changing the other players' parameters. One useful approach for the theory of GANs is to show that a divergence between the training distribution and the model distribution obtains its minimum value at equilibrium. Several recent research directions have been motivated by the idea that this divergence is the primary guide for the learning process and that every step of learning should decrease the divergence. We show that this view is overly restrictive. During GAN training, the discriminator provides learning signal in situations where the gradients of the divergences between distributions would not be useful. We provide empirical counterexamples to the view of GAN training as divergence minimization. Specifically, we demonstrate that GANs are able to learn distributions in situations where the divergence minimization point of view predicts they would fail. We also show that gradient penalties motivated from the divergence minimization perspective are equally helpful when applied in other contexts in which the divergence minimization perspective does not predict they would be helpful. This contributes to a growing body of evidence that GAN training may be more usefully viewed as approaching Nash equilibria via trajectories that do not necessarily minimize a specific divergence at each step.
△ Less
Submitted 20 February, 2018; v1 submitted 23 October, 2017;
originally announced October 2017.
-
On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches
Authors:
Martín Abadi,
Úlfar Erlingsson,
Ian Goodfellow,
H. Brendan McMahan,
Ilya Mironov,
Nicolas Papernot,
Kunal Talwar,
Li Zhang
Abstract:
The recent, remarkable growth of machine learning has led to intense interest in the privacy of the data on which machine learning relies, and to new techniques for preserving privacy. However, older ideas about privacy may well remain valid and useful. This note reviews two recent works on privacy in the light of the wisdom of some of the early literature, in particular the principles distilled b…
▽ More
The recent, remarkable growth of machine learning has led to intense interest in the privacy of the data on which machine learning relies, and to new techniques for preserving privacy. However, older ideas about privacy may well remain valid and useful. This note reviews two recent works on privacy in the light of the wisdom of some of the early literature, in particular the principles distilled by Saltzer and Schroeder in the 1970s.
△ Less
Submitted 26 August, 2017;
originally announced August 2017.
-
Ensemble Adversarial Training: Attacks and Defenses
Authors:
Florian Tramèr,
Alexey Kurakin,
Nicolas Papernot,
Ian Goodfellow,
Dan Boneh,
Patrick McDaniel
Abstract:
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate…
▽ More
Adversarial examples are perturbed inputs designed to fool machine learning models. Adversarial training injects such examples into training data to increase robustness. To scale this technique to large datasets, perturbations are crafted using fast single-step methods that maximize a linear approximation of the model's loss. We show that this form of adversarial training converges to a degenerate global minimum, wherein small curvature artifacts near the data points obfuscate a linear approximation of the loss. The model thus learns to generate weak perturbations, rather than defend against strong ones. As a result, we find that adversarial training remains vulnerable to black-box attacks, where we transfer perturbations computed on undefended models, as well as to a powerful novel single-step attack that escapes the non-smooth vicinity of the input data via a small random step. We further introduce Ensemble Adversarial Training, a technique that augments training data with perturbations transferred from other models. On ImageNet, Ensemble Adversarial Training yields models with strong robustness to black-box attacks. In particular, our most robust model won the first round of the NIPS 2017 competition on Defenses against Adversarial Attacks. However, subsequent work found that more elaborate black-box attacks could significantly enhance transferability and reduce the accuracy of our models.
△ Less
Submitted 26 April, 2020; v1 submitted 19 May, 2017;
originally announced May 2017.
-
The Space of Transferable Adversarial Examples
Authors:
Florian Tramèr,
Nicolas Papernot,
Ian Goodfellow,
Dan Boneh,
Patrick McDaniel
Abstract:
Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time. They often transfer: the same adversarial example fools more than one model.
In this work, we propose novel methods for estimating the previously unknown dimensionality of the space of adversarial inputs. We find that adversarial examples span a contiguous subspace of large (~25)…
▽ More
Adversarial examples are maliciously perturbed inputs designed to mislead machine learning (ML) models at test-time. They often transfer: the same adversarial example fools more than one model.
In this work, we propose novel methods for estimating the previously unknown dimensionality of the space of adversarial inputs. We find that adversarial examples span a contiguous subspace of large (~25) dimensionality. Adversarial subspaces with higher dimensionality are more likely to intersect. We find that for two different models, a significant fraction of their subspaces is shared, thus enabling transferability.
In the first quantitative analysis of the similarity of different models' decision boundaries, we show that these boundaries are actually close in arbitrary directions, whether adversarial or benign. We conclude by formally studying the limits of transferability. We derive (1) sufficient conditions on the data distribution that imply transferability for simple model classes and (2) examples of scenarios in which transfer does not occur. These findings indicate that it may be possible to design defenses against transfer-based attacks, even for models that are vulnerable to direct attacks.
△ Less
Submitted 23 May, 2017; v1 submitted 11 April, 2017;
originally announced April 2017.
-
Adversarial Attacks on Neural Network Policies
Authors:
Sandy Huang,
Nicolas Papernot,
Ian Goodfellow,
Yan Duan,
Pieter Abbeel
Abstract:
Machine learning classifiers are known to be vulnerable to inputs maliciously constructed by adversaries to force misclassification. Such adversarial examples have been extensively studied in the context of computer vision applications. In this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adver…
▽ More
Machine learning classifiers are known to be vulnerable to inputs maliciously constructed by adversaries to force misclassification. Such adversarial examples have been extensively studied in the context of computer vision applications. In this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Our threat model considers adversaries capable of introducing small perturbations to the raw input of the policy. We characterize the degree of vulnerability across tasks and training algorithms, for a subclass of adversarial-example attacks in white-box and black-box settings. Regardless of the learned task or training algorithm, we observe a significant drop in performance, even with small adversarial perturbations that do not interfere with human perception. Videos are available at http://rll.berkeley.edu/adversarial.
△ Less
Submitted 7 February, 2017;
originally announced February 2017.
-
Adversarial Machine Learning at Scale
Authors:
Alexey Kurakin,
Ian Goodfellow,
Samy Bengio
Abstract:
Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean…
▽ More
Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research, we apply adversarial training to ImageNet. Our contributions include: (1) recommendations for how to succesfully scale adversarial training to large models and datasets, (2) the observation that adversarial training confers robustness to single-step attack methods, (3) the finding that multi-step attack methods are somewhat less transferable than single-step attack methods, so single-step attacks are the best for mounting black-box attacks, and (4) resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples, because the adversarial example construction process uses the true label and the model can learn to exploit regularities in the construction process.
△ Less
Submitted 10 February, 2017; v1 submitted 3 November, 2016;
originally announced November 2016.
-
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Authors:
Nicolas Papernot,
Martín Abadi,
Úlfar Erlingsson,
Ian Goodfellow,
Kunal Talwar
Abstract:
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information.
To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees…
▽ More
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information.
To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees for training data: Private Aggregation of Teacher Ensembles (PATE). The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as "teachers" for a "student" model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student's privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student's training) and formally, in terms of differential privacy. These properties hold even if an adversary can not only query the student but also inspect its internal workings.
Compared with previous work, the approach imposes only weak assumptions on how teachers are trained: it applies to any model, including non-convex models like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and SVHN thanks to an improved privacy analysis and semi-supervised learning.
△ Less
Submitted 3 March, 2017; v1 submitted 18 October, 2016;
originally announced October 2016.
-
Technical Report on the CleverHans v2.1.0 Adversarial Examples Library
Authors:
Nicolas Papernot,
Fartash Faghri,
Nicholas Carlini,
Ian Goodfellow,
Reuben Feinman,
Alexey Kurakin,
Cihang Xie,
Yash Sharma,
Tom Brown,
Aurko Roy,
Alexander Matyasko,
Vahid Behzadan,
Karen Hambardzumyan,
Zhishuai Zhang,
Yi-Lin Juang,
Zhi Li,
Ryan Sheatsley,
Abhibhav Garg,
Jonathan Uesato,
Willi Gierke,
Yinpeng Dong,
David Berthelot,
Paul Hendricks,
Jonas Rauber,
Rujun Long
, et al. (1 additional authors not shown)
Abstract:
CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial exam…
▽ More
CleverHans is a software library that provides standardized reference implementations of adversarial example construction techniques and adversarial training. The library may be used to develop more robust machine learning models and to provide standardized benchmarks of models' performance in the adversarial setting. Benchmarks constructed without a standardized implementation of adversarial example construction are not comparable to each other, because a good result may indicate a robust model or it may merely indicate a weak implementation of the adversarial example construction procedure.
This technical report is structured as follows. Section 1 provides an overview of adversarial examples in machine learning and of the CleverHans software. Section 2 presents the core functionalities of the library: namely the attacks based on adversarial examples and defenses to improve the robustness of machine learning models to these attacks. Section 3 describes how to report benchmark results using the library. Section 4 describes the versioning system.
△ Less
Submitted 27 June, 2018; v1 submitted 3 October, 2016;
originally announced October 2016.
-
Adversarial examples in the physical world
Authors:
Alexey Kurakin,
Ian Goodfellow,
Samy Bengio
Abstract:
Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier…
▽ More
Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.
△ Less
Submitted 10 February, 2017; v1 submitted 8 July, 2016;
originally announced July 2016.
-
Deep Learning with Differential Privacy
Authors:
Martín Abadi,
Andy Chu,
Ian Goodfellow,
H. Brendan McMahan,
Ilya Mironov,
Kunal Talwar,
Li Zhang
Abstract:
Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refin…
▽ More
Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
△ Less
Submitted 24 October, 2016; v1 submitted 1 July, 2016;
originally announced July 2016.
-
Adversarial Training Methods for Semi-Supervised Text Classification
Authors:
Takeru Miyato,
Andrew M. Dai,
Ian Goodfellow
Abstract:
Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations. We ex…
▽ More
Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations. We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. The proposed method achieves state of the art results on multiple benchmark semi-supervised and purely supervised tasks. We provide visualizations and analysis showing that the learned word embeddings have improved in quality and that while training, the model is less prone to overfitting. Code is available at https://github.com/tensorflow/models/tree/master/research/adversarial_text.
△ Less
Submitted 16 November, 2021; v1 submitted 25 May, 2016;
originally announced May 2016.
-
Efficient Per-Example Gradient Computations
Authors:
Ian Goodfellow
Abstract:
This technical report describes an efficient technique for computing the norm of the gradient of the loss function for a neural network with respect to its parameters. This gradient norm can be computed efficiently for every example.
This technical report describes an efficient technique for computing the norm of the gradient of the loss function for a neural network with respect to its parameters. This gradient norm can be computed efficiently for every example.
△ Less
Submitted 9 October, 2015; v1 submitted 6 October, 2015;
originally announced October 2015.
-
Explaining and Harnessing Adversarial Examples
Authors:
Ian J. Goodfellow,
Jonathon Shlens,
Christian Szegedy
Abstract:
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfittin…
▽ More
Several machine learning models, including neural networks, consistently misclassify adversarial examples---inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset, such that the perturbed input results in the model outputting an incorrect answer with high confidence. Early attempts at explaining this phenomenon focused on nonlinearity and overfitting. We argue instead that the primary cause of neural networks' vulnerability to adversarial perturbation is their linear nature. This explanation is supported by new quantitative results while giving the first explanation of the most intriguing fact about them: their generalization across architectures and training sets. Moreover, this view yields a simple and fast method of generating adversarial examples. Using this approach to provide examples for adversarial training, we reduce the test set error of a maxout network on the MNIST dataset.
△ Less
Submitted 20 March, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
Qualitatively characterizing neural network optimization problems
Authors:
Ian J. Goodfellow,
Oriol Vinyals,
Andrew M. Saxe
Abstract:
Training neural networks involves solving large-scale non-convex optimization problems. This task has long been believed to be extremely difficult, with fear of local minima and other obstacles motivating a variety of schemes to improve optimization, such as unsupervised pretraining. However, modern neural networks are able to achieve negligible training error on complex tasks, using only direct t…
▽ More
Training neural networks involves solving large-scale non-convex optimization problems. This task has long been believed to be extremely difficult, with fear of local minima and other obstacles motivating a variety of schemes to improve optimization, such as unsupervised pretraining. However, modern neural networks are able to achieve negligible training error on complex tasks, using only direct training with stochastic gradient descent. We introduce a simple analysis technique to look for evidence that such networks are overcoming local optima. We find that, in fact, on a straight path from initialization to solution, a variety of state of the art neural networks never encounter any significant obstacles.
△ Less
Submitted 21 May, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
On distinguishability criteria for estimating generative models
Authors:
Ian J. Goodfellow
Abstract:
Two recently introduced criteria for estimation of generative models are both based on a reduction to binary classification. Noise-contrastive estimation (NCE) is an estimation procedure in which a generative model is trained to be able to distinguish data samples from noise samples. Generative adversarial networks (GANs) are pairs of generator and discriminator networks, with the generator networ…
▽ More
Two recently introduced criteria for estimation of generative models are both based on a reduction to binary classification. Noise-contrastive estimation (NCE) is an estimation procedure in which a generative model is trained to be able to distinguish data samples from noise samples. Generative adversarial networks (GANs) are pairs of generator and discriminator networks, with the generator network learning to generate samples by attempting to fool the discriminator network into believing its samples are real data. Both estimation procedures use the same function to drive learning, which naturally raises questions about how they are related to each other, as well as whether this function is related to maximum likelihood estimation (MLE). NCE corresponds to training an internal data model belonging to the {\em discriminator} network but using a fixed generator network. We show that a variant of NCE, with a dynamic generator network, is equivalent to maximum likelihood estimation. Since pairing a learned discriminator with an appropriate dynamically selected generator recovers MLE, one might expect the reverse to hold for pairing a learned generator with a certain discriminator. However, we show that recovering MLE for a learned generator requires departing from the distinguishability game. Specifically:
(i) The expected gradient of the NCE discriminator can be made to match the expected gradient of
MLE, if one is allowed to use a non-stationary noise distribution for NCE,
(ii) No choice of discriminator network can make the expected gradient for the GAN generator match that of MLE, and
(iii) The existing theory does not guarantee that GANs will converge in the non-convex case.
This suggests that the key next step in GAN research is to determine whether GANs converge, and if not, to modify their training algorithm to force convergence.
△ Less
Submitted 21 May, 2015; v1 submitted 19 December, 2014;
originally announced December 2014.
-
Generative Adversarial Networks
Authors:
Ian J. Goodfellow,
Jean Pouget-Abadie,
Mehdi Mirza,
Bing Xu,
David Warde-Farley,
Sherjil Ozair,
Aaron Courville,
Yoshua Bengio
Abstract:
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This fram…
▽ More
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
△ Less
Submitted 10 June, 2014;
originally announced June 2014.
-
An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
Authors:
Ian J. Goodfellow,
Mehdi Mirza,
Da Xiao,
Aaron Courville,
Yoshua Bengio
Abstract:
Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, co…
▽ More
Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between tasks result in very different rankings of activation function performance. This suggests the choice of activation function should always be cross-validated.
△ Less
Submitted 3 March, 2015; v1 submitted 21 December, 2013;
originally announced December 2013.
-
An empirical analysis of dropout in piecewise linear networks
Authors:
David Warde-Farley,
Ian J. Goodfellow,
Aaron Courville,
Yoshua Bengio
Abstract:
The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters. In this work we empirically investigate several questions related to the efficacy of dropout, sp…
▽ More
The recently introduced dropout training criterion for neural networks has been the subject of much attention due to its simplicity and remarkable effectiveness as a regularizer, as well as its interpretation as a training procedure for an exponentially large ensemble of networks that share parameters. In this work we empirically investigate several questions related to the efficacy of dropout, specifically as it concerns networks employing the popular rectified linear activation function. We investigate the quality of the test time weight-scaling inference procedure by evaluating the geometric average exactly in small models, as well as compare the performance of the geometric mean to the arithmetic mean more commonly employed by ensemble techniques. We explore the effect of tied weights on the ensemble interpretation by training ensembles of masked networks without tied weights. Finally, we investigate an alternative criterion based on a biased estimator of the maximum likelihood ensemble gradient.
△ Less
Submitted 2 January, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.
-
On the Challenges of Physical Implementations of RBMs
Authors:
Vincent Dumoulin,
Ian J. Goodfellow,
Aaron Courville,
Yoshua Bengio
Abstract:
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing sample…
▽ More
Restricted Boltzmann machines (RBMs) are powerful machine learning models, but learning and some kinds of inference in the model require sampling-based approximations, which, in classical digital computers, are implemented using expensive MCMC. Physical computation offers the opportunity to reduce the cost of sampling by building physical systems whose natural dynamics correspond to drawing samples from the desired RBM distribution. Such a system avoids the burn-in and mixing cost of a Markov chain. However, hardware implementations of this variety usually entail limitations such as low-precision and limited range of the parameters and restrictions on the size and topology of the RBM. We conduct software simulations to determine how harmful each of these restrictions is. Our simulations are designed to reproduce aspects of the D-Wave quantum computer, but the issues we investigate arise in most forms of physical computation.
△ Less
Submitted 24 October, 2014; v1 submitted 18 December, 2013;
originally announced December 2013.
-
Pylearn2: a machine learning research library
Authors:
Ian J. Goodfellow,
David Warde-Farley,
Pascal Lamblin,
Vincent Dumoulin,
Mehdi Mirza,
Razvan Pascanu,
James Bergstra,
Frédéric Bastien,
Yoshua Bengio
Abstract:
Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summa…
▽ More
Pylearn2 is a machine learning research library. This does not just mean that it is a collection of machine learning algorithms that share a common API; it means that it has been designed for flexibility and extensibility in order to facilitate research projects that involve new or unusual use cases. In this paper we give a brief history of the library, an overview of its basic philosophy, a summary of the library's architecture, and a description of how the Pylearn2 community functions socially.
△ Less
Submitted 19 August, 2013;
originally announced August 2013.
-
Challenges in Representation Learning: A report on three machine learning contests
Authors:
Ian J. Goodfellow,
Dumitru Erhan,
Pierre Luc Carrier,
Aaron Courville,
Mehdi Mirza,
Ben Hamner,
Will Cukierski,
Yichuan Tang,
David Thaler,
Dong-Hyun Lee,
Yingbo Zhou,
Chetan Ramaiah,
Fangxiang Feng,
Ruifan Li,
Xiaojie Wang,
Dimitris Athanasakis,
John Shawe-Taylor,
Maxim Milakov,
John Park,
Radu Ionescu,
Marius Popescu,
Cristian Grozea,
James Bergstra,
Jingjing Xie,
Lukasz Romaszko
, et al. (3 additional authors not shown)
Abstract:
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kin…
▽ More
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions for organizers of future challenges and some comments on what kind of knowledge can be gained from machine learning competitions.
△ Less
Submitted 1 July, 2013;
originally announced July 2013.
-
Maxout Networks
Authors:
Ian J. Goodfellow,
David Warde-Farley,
Mehdi Mirza,
Aaron Courville,
Yoshua Bengio
Abstract:
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model av…
▽ More
We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.
△ Less
Submitted 20 September, 2013; v1 submitted 18 February, 2013;
originally announced February 2013.
-
Piecewise Linear Multilayer Perceptrons and Dropout
Authors:
Ian J. Goodfellow
Abstract:
We propose a new type of hidden layer for a multilayer perceptron, and demonstrate that it obtains the best reported performance for an MLP on the MNIST dataset.
We propose a new type of hidden layer for a multilayer perceptron, and demonstrate that it obtains the best reported performance for an MLP on the MNIST dataset.
△ Less
Submitted 22 January, 2013;
originally announced January 2013.
-
Joint Training Deep Boltzmann Machines for Classification
Authors:
Ian J. Goodfellow,
Aaron Courville,
Yoshua Bengio
Abstract:
We introduce a new method for training deep Boltzmann machines jointly. Prior methods of training DBMs require an initial learning pass that trains the model greedily, one layer at a time, or do not perform well on classification tasks. In our approach, we train all layers of the DBM simultaneously, using a novel training procedure called multi-prediction training. The resulting model can either b…
▽ More
We introduce a new method for training deep Boltzmann machines jointly. Prior methods of training DBMs require an initial learning pass that trains the model greedily, one layer at a time, or do not perform well on classification tasks. In our approach, we train all layers of the DBM simultaneously, using a novel training procedure called multi-prediction training. The resulting model can either be interpreted as a single generative model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent networks that share parameters and may be approximately averaged together using a novel technique we call the multi-inference trick. We show that our approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs.
△ Less
Submitted 1 May, 2013; v1 submitted 15 January, 2013;
originally announced January 2013.
-
Joint Training of Deep Boltzmann Machines
Authors:
Ian Goodfellow,
Aaron Courville,
Yoshua Bengio
Abstract:
We introduce a new method for training deep Boltzmann machines jointly. Prior methods require an initial learning pass that trains the deep Boltzmann machine greedily, one layer at a time, or do not perform well on classifi- cation tasks.
We introduce a new method for training deep Boltzmann machines jointly. Prior methods require an initial learning pass that trains the deep Boltzmann machine greedily, one layer at a time, or do not perform well on classifi- cation tasks.
△ Less
Submitted 11 December, 2012;
originally announced December 2012.
-
Large-Scale Feature Learning With Spike-and-Slab Sparse Coding
Authors:
Ian Goodfellow,
Aaron Courville,
Yoshua Bengio
Abstract:
We consider the problem of object recognition with a large number of classes. In order to overcome the low amount of labeled examples available in this setting, we introduce a new feature learning and extraction procedure based on a factor model we call spike-and-slab sparse coding (S3C). Prior work on S3C has not prioritized the ability to exploit parallel architectures and scale S3C to the enorm…
▽ More
We consider the problem of object recognition with a large number of classes. In order to overcome the low amount of labeled examples available in this setting, we introduce a new feature learning and extraction procedure based on a factor model we call spike-and-slab sparse coding (S3C). Prior work on S3C has not prioritized the ability to exploit parallel architectures and scale S3C to the enormous problem sizes needed for object recognition. We present a novel inference procedure for appropriate for use with GPUs which allows us to dramatically increase both the training set size and the amount of latent factors that S3C may be trained with. We demonstrate that this approach improves upon the supervised learning capabilities of both sparse coding and the spike-and-slab Restricted Boltzmann Machine (ssRBM) on the CIFAR-10 dataset. We use the CIFAR-100 dataset to demonstrate that our method scales to large numbers of classes better than previous methods. Finally, we use our method to win the NIPS 2011 Workshop on Challenges In Learning Hierarchical Models? Transfer Learning Challenge.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.