Search | arXiv e-print repository

Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection

Authors: Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu

Abstract: With the deployment of Large Language Models (LLMs) in interactive applications, online malicious intent detection has become increasingly critical. However, existing approaches fall short of handling diverse and complex user queries in real time. To address these challenges, we introduce ADRAG (Adversarial Distilled Retrieval-Augmented Guard), a two-stage framework for robust and efficient online… ▽ More With the deployment of Large Language Models (LLMs) in interactive applications, online malicious intent detection has become increasingly critical. However, existing approaches fall short of handling diverse and complex user queries in real time. To address these challenges, we introduce ADRAG (Adversarial Distilled Retrieval-Augmented Guard), a two-stage framework for robust and efficient online malicious intent detection. In the training stage, a high-capacity teacher model is trained on adversarially perturbed, retrieval-augmented inputs to learn robust decision boundaries over diverse and complex user queries. In the inference stage, a distillation scheduler transfers the teacher's knowledge into a compact student model, with a continually updated knowledge base collected online. At deployment, the compact student model leverages top-K similar safety exemplars retrieved from the online-updated knowledge base to enable both online and real-time malicious query detection. Evaluations across ten safety benchmarks demonstrate that ADRAG, with a 149M-parameter model, achieves 98.5% of WildGuard-7B's performance, surpasses GPT-4 by 3.3% and Llama-Guard-3-8B by 9.5% on out-of-distribution detection, while simultaneously delivering up to 5.6x lower latency at 300 queries per second (QPS) in real-time applications. △ Less

Submitted 18 September, 2025; originally announced September 2025.

arXiv:2501.09891 [pdf, other]

Evolving Deeper LLM Thinking

Authors: Kuang-Huei Lee, Ian Fischer, Yueh-Hua Wu, Dave Marwood, Shumeet Baluja, Dale Schuurmans, Xinyun Chen

Abstract: We explore an evolutionary search strategy for scaling inference time compute in Large Language Models. The proposed approach, Mind Evolution, uses a language model to generate, recombine and refine candidate responses. The proposed approach avoids the need to formalize the underlying inference problem whenever a solution evaluator is available. Controlling for inference cost, we find that Mind Ev… ▽ More We explore an evolutionary search strategy for scaling inference time compute in Large Language Models. The proposed approach, Mind Evolution, uses a language model to generate, recombine and refine candidate responses. The proposed approach avoids the need to formalize the underlying inference problem whenever a solution evaluator is available. Controlling for inference cost, we find that Mind Evolution significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks. In the TravelPlanner and Natural Plan benchmarks, Mind Evolution solves more than 98% of the problem instances using Gemini 1.5 Pro without the use of a formal solver. △ Less

Submitted 16 January, 2025; originally announced January 2025.

arXiv:2412.03206 [pdf, other]

doi 10.1364/OL.518946

Experimental reservoir computing with diffractively coupled VCSELs

Authors: Moritz Pflüger, Daniel Brunner, Tobias Heuser, James A. Lott, Stephan Reitzenstein, Ingo Fischer

Abstract: We present experiments on reservoir computing (RC) using a network of vertical-cavity surface-emitting lasers (VCSELs) that we diffractively couple via an external cavity. Our optical reservoir computer consists of 24 physical VCSEL nodes. We evaluate the system's memory and solve the 2-bit XOR task and the 3-bit header recognition (HR) task with bit error ratios (BERs) below 1\,\% and the 2-bit d… ▽ More We present experiments on reservoir computing (RC) using a network of vertical-cavity surface-emitting lasers (VCSELs) that we diffractively couple via an external cavity. Our optical reservoir computer consists of 24 physical VCSEL nodes. We evaluate the system's memory and solve the 2-bit XOR task and the 3-bit header recognition (HR) task with bit error ratios (BERs) below 1\,\% and the 2-bit digital-to-analog conversion (DAC) task with a root-mean-square error (RMSE) of 0.067. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Journal ref: Optics Letters 49, 2285 (2024)

arXiv:2410.02217 [pdf, other]

Stochastic Sampling from Deterministic Flow Models

Authors: Saurabh Singh, Ian Fischer

Abstract: Deterministic flow models, such as rectified flows, offer a general framework for learning a deterministic transport map between two distributions, realized as the vector field for an ordinary differential equation (ODE). However, they are sensitive to model estimation and discretization errors and do not permit different samples conditioned on an intermediate state, limiting their application. We… ▽ More Deterministic flow models, such as rectified flows, offer a general framework for learning a deterministic transport map between two distributions, realized as the vector field for an ordinary differential equation (ODE). However, they are sensitive to model estimation and discretization errors and do not permit different samples conditioned on an intermediate state, limiting their application. We present a general method to turn the underlying ODE of such flow models into a family of stochastic differential equations (SDEs) that have the same marginal distributions. This method permits us to derive families of \emph{stochastic samplers}, for fixed (e.g., previously trained) \emph{deterministic} flow models, that continuously span the spectrum of deterministic and stochastic sampling, given access to the flow field and the score function. Our method provides additional degrees of freedom that help alleviate the issues with the deterministic samplers and empirically outperforms them. We empirically demonstrate advantages of our method on a toy Gaussian setup and on the large scale ImageNet generation task. Further, our family of stochastic samplers provide an additional knob for controlling the diversity of generation, which we qualitatively demonstrate in our experiments. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: Submitted to ICLR 2025

arXiv:2405.07236 [pdf, other]

Adaptive control of recurrent neural networks using conceptors

Authors: Guillaume Pourcel, Mirko Goldmann, Ingo Fischer, Miguel C. Soriano

Abstract: Recurrent Neural Networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a Machine Learning setting, the network's parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities.… ▽ More Recurrent Neural Networks excel at predicting and generating complex high-dimensional temporal patterns. Due to their inherent nonlinear dynamics and memory, they can learn unbounded temporal dependencies from data. In a Machine Learning setting, the network's parameters are adapted during a training phase to match the requirements of a given task/problem increasing its computational capabilities. After the training, the network parameters are kept fixed to exploit the learned computations. The static parameters thereby render the network unadaptive to changing conditions, such as external or internal perturbation. In this manuscript, we demonstrate how keeping parts of the network adaptive even after the training enhances its functionality and robustness. Here, we utilize the conceptor framework and conceptualize an adaptive control loop analyzing the network's behavior continuously and adjusting its time-varying internal representation to follow a desired target. We demonstrate how the added adaptivity of the network supports the computational functionality in three distinct tasks: interpolation of temporal patterns, stabilization against partial network degradation, and robustness against input distortion. Our results highlight the potential of adaptive networks in machine learning beyond training, enabling them to not only learn complex patterns but also dynamically adjust to changing environments, ultimately broadening their applicability. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2402.09727 [pdf, other]

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Authors: Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer

Abstract: Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system tha… ▽ More Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs. To address these limitations, we propose ReadAgent, an LLM agent system that increases effective context length up to 20x in our experiments. Inspired by how humans interactively read long documents, we implement ReadAgent as a simple prompting system that uses the advanced language capabilities of LLMs to (1) decide what content to store together in a memory episode, (2) compress those memory episodes into short episodic memories called gist memories, and (3) take actions to look up passages in the original text if ReadAgent needs to remind itself of relevant details to complete a task. We evaluate ReadAgent against baselines using retrieval methods, using the original long contexts, and using the gist memories. These evaluations are performed on three long-document reading comprehension tasks: QuALITY, NarrativeQA, and QMSum. ReadAgent outperforms the baselines on all three tasks while extending the effective context window by 3.5-20x. △ Less

Submitted 22 July, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: Website: https://read-agent.github.io

arXiv:2211.09981 [pdf, other]

Weighted Ensemble Self-Supervised Learning

Authors: Yangjun Ruan, Saurabh Singh, Warren Morningstar, Alexander A. Alemi, Sergey Ioffe, Ian Fischer, Joshua V. Dillon

Abstract: Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framewo… ▽ More Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on ImageNet-1K, particularly in the few-shot setting. We explore several weighting schemes and find that those which increase the diversity of ensemble heads lead to better downstream evaluation results. Thorough experiments yield improved prior art baselines which our method still surpasses; e.g., our overall improvement with MSN ViT-B/16 is 3.9 p.p. for 1-shot learning. △ Less

Submitted 9 April, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted by ICLR 2023

arXiv:2210.08217 [pdf, other]

PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale

Authors: Kuang-Huei Lee, Ted Xiao, Adrian Li, Paul Wohlhart, Ian Fischer, Yao Lu

Abstract: The predictive information, the mutual information between the past and future, has been shown to be a useful representation learning auxiliary loss for training reinforcement learning agents, as the ability to model what will happen next is critical to success on many control tasks. While existing studies are largely restricted to training specialist agents on single-task settings in simulation,… ▽ More The predictive information, the mutual information between the past and future, has been shown to be a useful representation learning auxiliary loss for training reinforcement learning agents, as the ability to model what will happen next is critical to success on many control tasks. While existing studies are largely restricted to training specialist agents on single-task settings in simulation, in this work, we study modeling the predictive information for robotic agents and its importance for general-purpose agents that are trained to master a large repertoire of diverse skills from large amounts of data. Specifically, we introduce Predictive Information QT-Opt (PI-QT-Opt), a QT-Opt agent augmented with an auxiliary loss that learns representations of the predictive information to solve up to 297 vision-based robot manipulation tasks in simulation and the real world with a single set of parameters. We demonstrate that modeling the predictive information significantly improves success rates on the training tasks and leads to better zero-shot transfer to unseen novel tasks. Finally, we evaluate PI-QT-Opt on real robots, achieving substantial and consistent improvement over QT-Opt in multiple experimental settings of varying environments, skills, and multi-task configurations. △ Less

Submitted 24 November, 2022; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: CoRL 2022. 21 pages, 9 figures. The supplementary video is available at https://kuanghuei.github.io/piqtopt

arXiv:2207.14133 [pdf, other]

doi 10.1063/5.0116784

Learning unseen coexisting attractors

Authors: Daniel J. Gauthier, Ingo Fischer, André Röhm

Abstract: Reservoir computing is a machine learning approach that can generate a surrogate model of a dynamical system. It can learn the underlying dynamical system using fewer trainable parameters and hence smaller training data sets than competing approaches. Recently, a simpler formulation, known as next-generation reservoir computing, removes many algorithm metaparameters and identifies a well-performin… ▽ More Reservoir computing is a machine learning approach that can generate a surrogate model of a dynamical system. It can learn the underlying dynamical system using fewer trainable parameters and hence smaller training data sets than competing approaches. Recently, a simpler formulation, known as next-generation reservoir computing, removes many algorithm metaparameters and identifies a well-performing traditional reservoir computer, thus simplifying training even further. Here, we study a particularly challenging problem of learning a dynamical system that has both disparate time scales and multiple co-existing dynamical states (attractors). We compare the next-generation and traditional reservoir computer using metrics quantifying the geometry of the ground-truth and forecasted attractors. For the studied four-dimensional system, the next-generation reservoir computing approach uses $\sim 1.7 \times$ less training data, requires $10^3 \times$ shorter `warm up' time, has fewer metaparameters, and has an $\sim 100\times$ higher accuracy in predicting the co-existing attractor characteristics in comparison to a traditional reservoir computer. Furthermore, we demonstrate that it predicts the basin of attraction with high accuracy. This work lends further support to the superior learning ability of this new machine learning algorithm for dynamical systems. △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: 8 pages, 7 figures

arXiv:2206.04114 [pdf, other]

Deep Hierarchical Planning from Pixels

Authors: Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

Abstract: Intelligent agents need to select long sequences of actions to solve complex tasks. While humans easily break down tasks into subgoals and reach them through millions of muscle commands, current artificial intelligence is limited to tasks with horizons of a few hundred decisions, despite large compute budgets. Research on hierarchical reinforcement learning aims to overcome this limitation but has… ▽ More Intelligent agents need to select long sequences of actions to solve complex tasks. While humans easily break down tasks into subgoals and reach them through millions of muscle commands, current artificial intelligence is limited to tasks with horizons of a few hundred decisions, despite large compute budgets. Research on hierarchical reinforcement learning aims to overcome this limitation but has proven to be challenging, current methods rely on manually specified goal spaces or subtasks, and no general solution exists. We introduce Director, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model. The high-level policy maximizes task and exploration rewards by selecting latent goals and the low-level policy learns to achieve the goals. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization. Director outperforms exploration methods on tasks with sparse rewards, including 3D maze traversal with a quadruped robot from an egocentric camera and proprioception, without access to the global position or top-down view that was used by prior work. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels. △ Less

Submitted 8 June, 2022; originally announced June 2022.

Comments: Website: https://danijar.com/director

arXiv:2205.15241 [pdf, other]

Multi-Game Decision Transformers

Authors: Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch

Abstract: A longstanding goal of the field of AI is a method for learning a highly capable, generalist agent from diverse experience. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learnin… ▽ More A longstanding goal of the field of AI is a method for learning a highly capable, generalist agent from diverse experience. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learning agents. Specifically, we show that a single transformer-based model - with a single set of weights - trained purely offline can play a suite of up to 46 Atari games simultaneously at close-to-human performance. When trained and evaluated appropriately, we find that the same trends observed in language and vision hold, including scaling of performance with model size and rapid adaptation to new games via fine-tuning. We compare several approaches in this multi-game setting, such as online and offline RL methods and behavioral cloning, and find that our Multi-Game Decision Transformer models offer the best scalability and performance. We release the pre-trained models and code to encourage further research in this direction. △ Less

Submitted 15 October, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022. 24 pages, 16 figures. Additional information, videos and code can be seen at https://sites.google.com/view/multi-game-transformers

arXiv:2205.07886 [pdf, other]

An Empirical Investigation of Representation Learning for Imitation

Authors: Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

Abstract: Imitation learning often needs a large demonstration set in order to handle the full range of situations that an agent might find itself in during deployment. However, collecting expert demonstrations can be expensive. Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific… ▽ More Imitation learning often needs a large demonstration set in order to handle the full range of situations that an agent might find itself in during deployment. However, collecting expert demonstrations can be expensive. Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data. Our Empirical Investigation of Representation Learning for Imitation (EIRLI) investigates whether similar benefits apply to imitation learning. We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites. In the settings we evaluate, we find that existing algorithms for image-based representation learning provide limited value relative to a well-tuned baseline with image augmentations. To explain this result, we investigate differences between imitation learning and other settings where representation learning has provided significant benefit, such as image classification. Finally, we release a well-documented codebase which both replicates our findings and provides a modular framework for creating new representation learning algorithms out of reusable components. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: Accepted to NeurIPS2021 Datasets and Benchmarks Track

arXiv:2203.02592 [pdf, other]

Sparsity-Inducing Categorical Prior Improves Robustness of the Information Bottleneck

Authors: Anirban Samaddar, Sandeep Madireddy, Prasanna Balaprakash, Tapabrata Maiti, Gustavo de los Campos, Ian Fischer

Abstract: The information bottleneck framework provides a systematic approach to learning representations that compress nuisance information in the input and extract semantically meaningful information about predictions. However, the choice of a prior distribution that fixes the dimensionality across all the data can restrict the flexibility of this approach for learning robust representations. We present a… ▽ More The information bottleneck framework provides a systematic approach to learning representations that compress nuisance information in the input and extract semantically meaningful information about predictions. However, the choice of a prior distribution that fixes the dimensionality across all the data can restrict the flexibility of this approach for learning robust representations. We present a novel sparsity-inducing spike-slab categorical prior that uses sparsity as a mechanism to provide the flexibility that allows each data point to learn its own dimension distribution. In addition, it provides a mechanism for learning a joint distribution of the latent variable and the sparsity and hence can account for the complete uncertainty in the latent space. Through a series of experiments using in-distribution and out-of-distribution learning scenarios on the MNIST, CIFAR-10, and ImageNet data, we show that the proposed approach improves accuracy and robustness compared to traditional fixed-dimensional priors, as well as other sparsity induction mechanisms for latent variable models proposed in the literature. △ Less

Submitted 27 October, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

arXiv:2111.03706 [pdf, other]

Learn one size to infer all: Exploiting translational symmetries in delay-dynamical and spatio-temporal systems using scalable neural networks

Authors: Mirko Goldmann, Claudio R. Mirasso, Ingo Fischer, Miguel C. Soriano

Abstract: We design scalable neural networks adapted to translational symmetries in dynamical systems, capable of inferring untrained high-dimensional dynamics for different system sizes. We train these networks to predict the dynamics of delay-dynamical and spatio-temporal systems for a single size. Then, we drive the networks by their own predictions. We demonstrate that by scaling the size of the trained… ▽ More We design scalable neural networks adapted to translational symmetries in dynamical systems, capable of inferring untrained high-dimensional dynamics for different system sizes. We train these networks to predict the dynamics of delay-dynamical and spatio-temporal systems for a single size. Then, we drive the networks by their own predictions. We demonstrate that by scaling the size of the trained network, we can predict the complex dynamics for larger or smaller system sizes. Thus, the network learns from a single example and, by exploiting symmetry properties, infers entire bifurcation diagrams. △ Less

Submitted 5 July, 2024; v1 submitted 5 November, 2021; originally announced November 2021.

arXiv:2111.03332 [pdf, other]

doi 10.1063/1.5042342

Tutorial: Photonic Neural Networks in Delay Systems

Authors: D. Brunner, B. Penkovsky, B. A. Marquez, M. Jaquot, I. Fischer, L. Larger

Abstract: Photonic delay systems have revolutionized the hardware implementation of Recurrent Neural Networks and Reservoir Computing in particular. The fundamental principles of Reservoir Computing strongly benefit a realization in such complex analog systems. Especially delay systems, potentially providing large numbers of degrees of freedom even in simple architectures, can efficiently be exploited for i… ▽ More Photonic delay systems have revolutionized the hardware implementation of Recurrent Neural Networks and Reservoir Computing in particular. The fundamental principles of Reservoir Computing strongly benefit a realization in such complex analog systems. Especially delay systems, potentially providing large numbers of degrees of freedom even in simple architectures, can efficiently be exploited for information processing. The numerous demonstrations of their performance led to a revival of photonic Artificial Neural Network. Today, an astonishing variety of physical substrates, implementation techniques as well as network architectures based on this approach have been successfully employed. Important fundamental aspects of analog hardware Artificial Neural Networks have been investigated, and multiple high-performance applications have been demonstrated. Here, we introduce and explain the most relevant aspects of Artificial Neural Networks and delay systems, the seminal experimental demonstrations of Reservoir Computing in photonic delay systems, plus the most recent and advanced realizations. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Journal ref: Journal of Applied Physics 124, 152004 (2018)

arXiv:2109.12909 [pdf, other]

Compressive Visual Representations

Authors: Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer

Abstract: Learning effective visual representations that generalize well without human supervision is a fundamental problem in order to apply Machine Learning to a wide variety of tasks. Recently, two families of self-supervised methods, contrastive learning and latent bootstrapping, exemplified by SimCLR and BYOL respectively, have made significant progress. In this work, we hypothesize that adding explici… ▽ More Learning effective visual representations that generalize well without human supervision is a fundamental problem in order to apply Machine Learning to a wide variety of tasks. Recently, two families of self-supervised methods, contrastive learning and latent bootstrapping, exemplified by SimCLR and BYOL respectively, have made significant progress. In this work, we hypothesize that adding explicit information compression to these algorithms yields better and more robust representations. We verify this by developing SimCLR and BYOL formulations compatible with the Conditional Entropy Bottleneck (CEB) objective, allowing us to both measure and control the amount of compression in the learned representation, and observe their impact on downstream tasks. Furthermore, we explore the relationship between Lipschitz continuity and compression, showing a tractable lower bound on the Lipschitz constant of the encoders we learn. As Lipschitz continuity is closely related to robustness, this provides a new explanation for why compressed models are more robust. Our experiments confirm that adding compression to SimCLR and BYOL significantly improves linear evaluation accuracies and model robustness across a wide range of domain shifts. In particular, the compressed version of BYOL achieves 76.0% Top-1 linear evaluation accuracy on ImageNet with ResNet-50, and 78.8% with ResNet-50 2x. △ Less

Submitted 4 December, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

Comments: NeurIPS 2021. 27 pages, 4 figures. Code and pretrained models at https://github.com/google-research/compressive-visual-representations

arXiv:2108.04074 [pdf, other]

doi 10.1063/5.0065813

Model-free inference of unseen attractors: Reconstructing phase space features from a single noisy trajectory using reservoir computing

Authors: André Röhm, Daniel J. Gauthier, Ingo Fischer

Abstract: Reservoir computers are powerful tools for chaotic time series prediction. They can be trained to approximate phase space flows and can thus both predict future values to a high accuracy, as well as reconstruct the general properties of a chaotic attractor without requiring a model. In this work, we show that the ability to learn the dynamics of a complex system can be extended to systems with co-… ▽ More Reservoir computers are powerful tools for chaotic time series prediction. They can be trained to approximate phase space flows and can thus both predict future values to a high accuracy, as well as reconstruct the general properties of a chaotic attractor without requiring a model. In this work, we show that the ability to learn the dynamics of a complex system can be extended to systems with co-existing attractors, here a 4-dimensional extension of the well-known Lorenz chaotic system. We demonstrate that a reservoir computer can infer entirely unexplored parts of the phase space: a properly trained reservoir computer can predict the existence of attractors that were never approached during training and therefore are labelled as unseen. We provide examples where attractor inference is achieved after training solely on a single noisy trajectory. △ Less

Submitted 30 September, 2021; v1 submitted 6 August, 2021; originally announced August 2021.

Journal ref: Chaos 31, 103127 (2021)

arXiv:2011.10115 [pdf, other]

doi 10.1038/s41467-021-25427-4

Deep Neural Networks using a Single Neuron: Folded-in-Time Architecture using Feedback-Modulated Delay Loops

Authors: Florian Stelzer, André Röhm, Raul Vicente, Ingo Fischer, Serhiy Yanchuk

Abstract: Deep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedb… ▽ More Deep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedback signals. The network states emerge in time as a temporal unfolding of the neuron's dynamics. By adjusting the feedback-modulation within the loops, we adapt the network's connection weights. These connection weights are determined via a back-propagation algorithm, where both the delay-induced and local network connections must be taken into account. Our approach can fully represent standard Deep Neural Networks (DNN), encompasses sparse DNNs, and extends the DNN concept toward dynamical systems implementations. The new method, which we call Folded-in-time DNN (Fit-DNN), exhibits promising performance in a set of benchmark tasks. △ Less

Submitted 6 June, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

arXiv:2011.08711 [pdf, other]

VIB is Half Bayes

Authors: Alexander A Alemi, Warren R Morningstar, Ben Poole, Ian Fischer, Joshua V Dillon

Abstract: In discriminative settings such as regression and classification there are two random variables at play, the inputs X and the targets Y. Here, we demonstrate that the Variational Information Bottleneck can be viewed as a compromise between fully empirical and fully Bayesian objectives, attempting to minimize the risks due to finite sampling of Y only. We argue that this approach provides some of t… ▽ More In discriminative settings such as regression and classification there are two random variables at play, the inputs X and the targets Y. Here, we demonstrate that the Variational Information Bottleneck can be viewed as a compromise between fully empirical and fully Bayesian objectives, attempting to minimize the risks due to finite sampling of Y only. We argue that this approach provides some of the benefits of Bayes while requiring only some of the work. △ Less

Submitted 17 November, 2020; originally announced November 2020.

arXiv:2007.12401 [pdf, other]

Predictive Information Accelerates Learning in RL

Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama

Abstract: The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed repres… ▽ More The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed representation of the predictive information of the RL environment dynamics using a contrastive version of the Conditional Entropy Bottleneck (CEB) objective. We refer to these as Predictive Information SAC (PI-SAC) agents. We show that PI-SAC agents can substantially improve sample efficiency over challenging baselines on tasks from the DM Control suite of continuous control environments. We evaluate PI-SAC agents by comparing against uncompressed PI-SAC agents, other compressed and uncompressed agents, and SAC agents directly trained from pixels. Our implementation is given on GitHub. △ Less

Submitted 25 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: To appear at NeurIPS 2020

arXiv:2007.12335 [pdf, other]

Cycles in Causal Learning

Authors: Katie Everett, Ian Fischer

Abstract: In the causal learning setting, we wish to learn cause-and-effect relationships between variables such that we can correctly infer the effect of an intervention. While the difference between a cyclic structure and an acyclic structure may be just a single edge, cyclic causal structures have qualitatively different behavior under intervention: cycles cause feedback loops when the downstream effect… ▽ More In the causal learning setting, we wish to learn cause-and-effect relationships between variables such that we can correctly infer the effect of an intervention. While the difference between a cyclic structure and an acyclic structure may be just a single edge, cyclic causal structures have qualitatively different behavior under intervention: cycles cause feedback loops when the downstream effect of an intervention propagates back to the source variable. We present three theoretical observations about probability distributions with self-referential factorizations, i.e. distributions that could be graphically represented with a cycle. First, we prove that self-referential distributions in two variables are, in fact, independent. Second, we prove that self-referential distributions in N variables have zero mutual information. Lastly, we prove that self-referential distributions that factorize in a cycle, also factorize as though the cycle were reversed. These results suggest that cyclic causal dependence may exist even where observational data suggest independence among variables. Methods based on estimating mutual information, or heuristics based on independent causal mechanisms, are likely to fail to learn cyclic casual structures. We encourage future work in causal learning that carefully considers cycles. △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:2006.13933 [pdf, other]

doi 10.1088/2515-7647/aba671

Developing of a photonic hardware platform for brain-inspired computing based on $5\times5$ VCSEL arrays

Authors: T. Heuser, M. Pflüger, I. Fischer, J. A. Lott, D. Brunner, S. Reitzenstein

Abstract: Brain-inspired computing concepts like artificial neural networks have become promising alternatives to classical von Neumann computer architectures. Photonic neural networks target the realizations of neurons, network connections and potentially learning in photonic substrates. Here, we report the development of a nanophotonic hardware platform of fast and energy-efficient photonic neurons via ar… ▽ More Brain-inspired computing concepts like artificial neural networks have become promising alternatives to classical von Neumann computer architectures. Photonic neural networks target the realizations of neurons, network connections and potentially learning in photonic substrates. Here, we report the development of a nanophotonic hardware platform of fast and energy-efficient photonic neurons via arrays of high-quality vertical cavity surface emitting lasers (VCSELs). The developed $5\times5$ VCSEL arrays provide high optical injection locking efficiency through homogeneous fabrication combined with individual control over the laser wavelengths. Injection locking is crucial for the reliable processing of information in VCSEL-based photonic neurons, and we demonstrate the suitability of the VCSEL arrays by injection locking measurements and current-induced spectral fine-tuning. We find that our investigated array can readily be tuned to the required spectral homogeneity, and as such show that VCSEL arrays based on our technology can act as highly energy efficient and ultra-fast photonic neurons for next generation photonic neural networks. Combined with fully parallel photonic networks our substrates are promising for ultra-fast operation reaching 10s of GHz bandwidths, and we show that a nonlinear transformation based on our lasers will consume only about 100 fJ per VCSEL, which is highly competitive, compared to other platforms. △ Less

Submitted 24 June, 2020; originally announced June 2020.

arXiv:2006.06752 [pdf, other]

An Unsupervised Information-Theoretic Perceptual Quality Metric

Authors: Sangnie Bhardwaj, Ian Fischer, Johannes Ballé, Troy Chinen

Abstract: Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large sets of class labels such as ImageNet, image quality ratings,… ▽ More Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large sets of class labels such as ImageNet, image quality ratings, or both. We combine recent advances in information-theoretic objective functions with a computational architecture informed by the physiology of the human visual system and unsupervised training on pairs of video frames, yielding our Perceptual Information Metric (PIM). We show that PIM is competitive with supervised metrics on the recent and challenging BAPPS image quality assessment dataset and outperforms them in predicting the ranking of image compression methods in CLIC 2020. We also perform qualitative experiments using the ImageNet-C dataset, and establish that PIM is robust with respect to architectural details. △ Less

Submitted 10 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 19 pages, 10 figures. Presented at NeurIPS 2020. Code available at https://github.com/google-research/perceptual-quality

arXiv:2002.05380 [pdf, other]

doi 10.3390/e22101081

CEB Improves Model Robustness

Authors: Ian Fischer, Alexander A. Alemi

Abstract: We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve model robustness. CEB is an easy strategy to implement and works in tandem with data augmentation procedures. We report results of a large scale adversarial robustness study on CIFAR-10, as well as the ImageNet-C Common Corruptions Benchmark, ImageNet-A, and PGD attacks. We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve model robustness. CEB is an easy strategy to implement and works in tandem with data augmentation procedures. We report results of a large scale adversarial robustness study on CIFAR-10, as well as the ImageNet-C Common Corruptions Benchmark, ImageNet-A, and PGD attacks. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.05379 [pdf, other]

doi 10.3390/e22090999

The Conditional Entropy Bottleneck

Authors: Ian Fischer

Abstract: Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a he… ▽ More Much of the field of Machine Learning exhibits a prominent set of failure modes, including vulnerability to adversarial examples, poor out-of-distribution (OoD) detection, miscalibration, and willingness to memorize random labelings of datasets. We characterize these as failures of robust generalization, which extends the traditional measure of generalization as accuracy or related metrics on a held-out set. We hypothesize that these failures to robustly generalize are due to the learning systems retaining too much information about the training data. To test this hypothesis, we propose the Minimum Necessary Information (MNI) criterion for evaluating the quality of a model. In order to train models that perform well with respect to the MNI criterion, we present a new objective function, the Conditional Entropy Bottleneck (CEB), which is closely related to the Information Bottleneck (IB). We experimentally test our hypothesis by comparing the performance of CEB models with deterministic models and Variational Information Bottleneck (VIB) models on a variety of different datasets and robustness challenges. We find strong empirical evidence supporting our hypothesis that MNI models improve on these problems of robust generalization. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2001.01878 [pdf, other]

Phase Transitions for the Information Bottleneck in Representation Learning

Authors: Tailin Wu, Ian Fischer

Abstract: In the Information Bottleneck (IB), when tuning the relative strength between compression and prediction terms, how do the two terms behave, and what's their relationship with the dataset and the learned representation? In this paper, we set out to answer these questions by studying multiple phase transitions in the IB objective: $\text{IB}_β[p(z|x)] = I(X; Z) - βI(Y; Z)$ defined on the encoding d… ▽ More In the Information Bottleneck (IB), when tuning the relative strength between compression and prediction terms, how do the two terms behave, and what's their relationship with the dataset and the learned representation? In this paper, we set out to answer these questions by studying multiple phase transitions in the IB objective: $\text{IB}_β[p(z|x)] = I(X; Z) - βI(Y; Z)$ defined on the encoding distribution p(z|x) for input $X$, target $Y$ and representation $Z$, where sudden jumps of $dI(Y; Z)/d β$ and prediction accuracy are observed with increasing $β$. We introduce a definition for IB phase transitions as a qualitative change of the IB loss landscape, and show that the transitions correspond to the onset of learning new classes. Using second-order calculus of variations, we derive a formula that provides a practical condition for IB phase transitions, and draw its connection with the Fisher information matrix for parameterized models. We provide two perspectives to understand the formula, revealing that each IB phase transition is finding a component of maximum (nonlinear) correlation between $X$ and $Y$ orthogonal to the learned representation, in close analogy with canonical-correlation analysis (CCA) in linear settings. Based on the theory, we present an algorithm for discovering phase transition points. Finally, we verify that our theory and algorithm accurately predict phase transitions in categorical datasets, predict the onset of learning new classes and class difficulty in MNIST, and predict prominent phase transitions in CIFAR10. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Comments: ICLR 2020; 27 pages, 7 figures

arXiv:1907.09578 [pdf, other]

Information-Bottleneck Approach to Salient Region Discovery

Authors: Andrey Zhmoginov, Ian Fischer, Mark Sandler

Abstract: We propose a new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, ou… ▽ More We propose a new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, our attention model produces a Boolean rather than a continuous mask, entirely concealing the information in masked-out pixels. Using a set of synthetic datasets based on MNIST and CIFAR10 and the SVHN datasets, we demonstrate that our method can successfully attend to features known to define the image class. △ Less

Submitted 14 February, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

arXiv:1907.07331 [pdf, other]

doi 10.3390/e21100924

Learnability for the Information Bottleneck

Authors: Tailin Wu, Ian Fischer, Isaac L. Chuang, Max Tegmark

Abstract: The Information Bottleneck (IB) method (\cite{tishby2000information}) provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective $I(X;Z)-βI(Y;Z)$ employs a Lagrange multiplier $β$ to tune this trade-off. However, in practice, not only is $β$ chosen empirically without theoretical guidance, there is also a lack of theoretica… ▽ More The Information Bottleneck (IB) method (\cite{tishby2000information}) provides an insightful and principled approach for balancing compression and prediction for representation learning. The IB objective $I(X;Z)-βI(Y;Z)$ employs a Lagrange multiplier $β$ to tune this trade-off. However, in practice, not only is $β$ chosen empirically without theoretical guidance, there is also a lack of theoretical understanding between $β$, learnability, the intrinsic nature of the dataset and model capacity. In this paper, we show that if $β$ is improperly chosen, learning cannot happen -- the trivial representation $P(Z|X)=P(Z)$ becomes the global minimum of the IB objective. We show how this can be avoided, by identifying a sharp phase transition between the unlearnable and the learnable which arises as $β$ is varied. This phase transition defines the concept of IB-Learnability. We prove several sufficient conditions for IB-Learnability, which provides theoretical guidance for choosing a good $β$. We further show that IB-learnability is determined by the largest confident, typical, and imbalanced subset of the examples (the conspicuous subset), and discuss its relation with model capacity. We give practical algorithms to estimate the minimum $β$ for a given dataset. We also empirically demonstrate our theoretical conditions with analyses of synthetic datasets, MNIST, and CIFAR10. △ Less

Submitted 17 July, 2019; originally announced July 2019.

Comments: Accepted at UAI 2019

arXiv:1905.07478 [pdf, other]

Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces

Authors: Bryan Seybold, Emily Fertig, Alex Alemi, Ian Fischer

Abstract: Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information. For example, variational autoencoders with autoregressive decoders often collapse into autodecoders, where they learn to ignore the encoder input. In this work, we demonstrate that adding an auxiliary decoder to regularize the laten… ▽ More Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information. For example, variational autoencoders with autoregressive decoders often collapse into autodecoders, where they learn to ignore the encoder input. In this work, we demonstrate that adding an auxiliary decoder to regularize the latent space can prevent this collapse, but successful auxiliary decoding tasks are domain dependent. Auxiliary decoders can increase the amount of semantic information encoded in the latent space and visible in the reconstructions. The semantic information in the variational autoencoder's representation is only weakly correlated with its rate, distortion, or evidence lower bound. Compared to other popular strategies that modify the training objective, our regularization of the latent space generally increased the semantic information content. △ Less

Submitted 17 May, 2019; originally announced May 2019.

Comments: 16 pages, 9 figures, supplemental

arXiv:1811.04551 [pdf, other]

Learning Latent Dynamics for Planning from Pixels

Authors: Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson

Abstract: Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purel… ▽ More Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. To achieve high performance, the dynamics model must accurately predict the rewards ahead for multiple time steps. We approach this using a latent dynamics model with both deterministic and stochastic transition components. Moreover, we propose a multi-step variational inference objective that we name latent overshooting. Using only pixel observations, our agent solves continuous control tasks with contact dynamics, partial observability, and sparse rewards, which exceed the difficulty of tasks that were previously solved by planning with learned models. PlaNet uses substantially fewer episodes and reaches final performance close to and sometimes higher than strong model-free algorithms. △ Less

Submitted 4 June, 2019; v1 submitted 11 November, 2018; originally announced November 2018.

Comments: 20 pages, 12 figures, 1 table

arXiv:1807.04162 [pdf, other]

TherML: Thermodynamics of Machine Learning

Authors: Alexander A. Alemi, Ian Fischer

Abstract: In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications. In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications. △ Less

Submitted 4 October, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

Comments: Presented at the ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models

arXiv:1807.00906 [pdf, other]

Uncertainty in the Variational Information Bottleneck

Authors: Alexander A. Alemi, Ian Fischer, Joshua V. Dillon

Abstract: We present a simple case study, demonstrating that Variational Information Bottleneck (VIB) can improve a network's classification calibration as well as its ability to detect out-of-distribution data. Without explicitly being designed to do so, VIB gives two natural metrics for handling and quantifying uncertainty. We present a simple case study, demonstrating that Variational Information Bottleneck (VIB) can improve a network's classification calibration as well as its ability to detect out-of-distribution data. Without explicitly being designed to do so, VIB gives two natural metrics for handling and quantifying uncertainty. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: 10 pages, 7 figures. Accepted to UAI 2018 - Uncertainty in Deep Learning Workshop

arXiv:1802.04874 [pdf, other]

GILBO: One Metric to Measure Them All

Authors: Alexander A. Alemi, Ian Fischer

Abstract: We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We com… ▽ More We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs each trained on four datasets (MNIST, FashionMNIST, CIFAR-10 and CelebA) and discuss the results. △ Less

Submitted 10 January, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

Comments: Accepted at NeurIPS 2018

arXiv:1711.05133 [pdf, other]

doi 10.1364/OPTICA.5.000756

Reinforcement Learning in a large scale photonic Recurrent Neural Network

Authors: Julian Bueno, Sheler Maktoobi, Luc Froehly, Ingo Fischer, Maxime Jacquot, Laurent Larger, Daniel Brunner

Abstract: Photonic Neural Network implementations have been gaining considerable attention as a potentially disruptive future technology. Demonstrating learning in large scale neural networks is essential to establish photonic machine learning substrates as viable information processing systems. Realizing photonic Neural Networks with numerous nonlinear nodes in a fully parallel and efficient learning hardw… ▽ More Photonic Neural Network implementations have been gaining considerable attention as a potentially disruptive future technology. Demonstrating learning in large scale neural networks is essential to establish photonic machine learning substrates as viable information processing systems. Realizing photonic Neural Networks with numerous nonlinear nodes in a fully parallel and efficient learning hardware was lacking so far. We demonstrate a network of up to 2500 diffractively coupled photonic nodes, forming a large scale Recurrent Neural Network. Using a Digital Micro Mirror Device, we realize reinforcement learning. Our scheme is fully parallel, and the passive weights maximize energy efficiency and bandwidth. The computational output efficiently converges and we achieve very good performance. △ Less

Submitted 15 November, 2017; v1 submitted 14 November, 2017; originally announced November 2017.

Journal ref: Optica Vol. 5, Issue 6, pp. 756-760 (2018)

arXiv:1711.00464 [pdf, other]

Fixing a Broken ELBO

Authors: Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy

Abstract: Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good… ▽ More Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code. △ Less

Submitted 13 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: 21 pages, 9 figures

arXiv:1705.10762 [pdf, other]

Generative Models of Visually Grounded Imagination

Authors: Ramakrishna Vedantam, Ian Fischer, Jonathan Huang, Kevin Murphy

Abstract: It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network,… ▽ More It is easy for people to imagine what a man with pink hair looks like, even if they have never seen such a person before. We call the ability to create images of novel semantic concepts visually grounded imagination. In this paper, we show how we can modify variational auto-encoders to perform this task. Our method uses a novel training objective, and a novel product-of-experts inference network, which can handle partially specified (abstract) concepts in a principled and efficient way. We also propose a set of easy-to-compute evaluation metrics that capture our intuitive notions of what it means to have good visual imagination, namely correctness, coverage, and compositionality (the 3 C's). Finally, we perform a detailed comparison of our method with two existing joint image-attribute VAE methods (the JMVAE method of Suzuki et.al. and the BiVCCA method of Wang et.al.) by applying them to two datasets: the MNIST-with-attributes dataset (which we introduce here), and the CelebA dataset. △ Less

Submitted 9 November, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

Comments: International Conference on Learning Representations (ICLR), 2018

arXiv:1703.09387 [pdf, other]

Adversarial Transformation Networks: Learning to Generate Adversarial Examples

Authors: Shumeet Baluja, Ian Fischer

Abstract: Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides excepti… ▽ More Multiple different approaches of generating adversarial examples have been proposed to attack deep neural networks. These approaches involve either directly computing gradients with respect to the image pixels, or directly solving an optimization on the image pixels. In this work, we present a fundamentally new method for generating adversarial examples that is fast to execute and provides exceptional diversity of output. We efficiently train feed-forward neural networks in a self-supervised manner to generate adversarial examples against a target network or set of networks. We call such a network an Adversarial Transformation Network (ATN). ATNs are trained to generate adversarial examples that minimally modify the classifier's outputs given the original input, while constraining the new classification to match an adversarial target class. We present methods to train ATNs and analyze their effectiveness targeting a variety of MNIST classifiers as well as the latest state-of-the-art ImageNet classifier Inception ResNet v2. △ Less

Submitted 27 March, 2017; originally announced March 2017.

arXiv:1702.06832 [pdf, other]

Adversarial examples for generative models

Authors: Jernej Kos, Ian Fischer, Dawn Song

Abstract: We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model in… ▽ More We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present three classes of attacks on the VAE and VAE-GAN architectures and demonstrate them against networks trained on MNIST, SVHN and CelebA. Our first attack leverages classification-based adversaries by attaching a classifier to the trained encoder of the target generative model, which can then be used to indirectly manipulate the latent representation. Our second attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our third attack moves beyond relying on classification or the standard loss for the gradient and directly optimizes against differences in source and target latent representations. We also motivate why an attacker might be interested in deploying such techniques against a target generative network. △ Less

Submitted 22 February, 2017; originally announced February 2017.

arXiv:1612.00410 [pdf, other]

Deep Variational Information Bottleneck

Authors: Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy

Abstract: We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outpe… ▽ More We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack. △ Less

Submitted 23 October, 2019; v1 submitted 1 December, 2016; originally announced December 2016.

Comments: 19 pages, 8 figures, Accepted to ICLR17

Journal ref: Proceedings of the International Conference on Learning Representations (ICLR) 2017

arXiv:1611.10012 [pdf, other]

Speed/accuracy trade-offs for modern convolutional object detectors

Authors: Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

Abstract: The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples… ▽ More The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task. △ Less

Submitted 24 April, 2017; v1 submitted 30 November, 2016; originally announced November 2016.

Comments: Accepted to CVPR 2017

arXiv:1609.03363 [pdf, other]

doi 10.1109/ACCESS.2016.2585468

CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT

Authors: Dejan Vukobratovic, Dusan Jakovetic, Vitaly Skachek, Dragana Bajovic, Dino Sejdinovic, Gunes Karabulut Kurt, Camilla Hollanti, Ingo Fischer

Abstract: In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication inf… ▽ More In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense, which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing "atomic" functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture. △ Less

Submitted 12 September, 2016; originally announced September 2016.

Comments: 17 pages, 7 figures in IEEE Access, Vol. 4, 2016

arXiv:1501.02592 [pdf, other]

Photonic Delay Systems as Machine Learning Implementations

Authors: Michiel Hermans, Miguel Soriano, Joni Dambre, Peter Bienstman, Ingo Fischer

Abstract: Nonlinear photonic delay systems present interesting implementation platforms for machine learning models. They can be extremely fast, offer great degrees of parallelism and potentially consume far less power than digital processors. So far they have been successfully employed for signal processing using the Reservoir Computing paradigm. In this paper we show that their range of applicability can… ▽ More Nonlinear photonic delay systems present interesting implementation platforms for machine learning models. They can be extremely fast, offer great degrees of parallelism and potentially consume far less power than digital processors. So far they have been successfully employed for signal processing using the Reservoir Computing paradigm. In this paper we show that their range of applicability can be greatly extended if we use gradient descent with backpropagation through time on a model of the system to optimize the input encoding of such systems. We perform physical experiments that demonstrate that the obtained input encodings work well in reality, and we show that optimized systems perform significantly better than the common Reservoir Computing approach. The results presented here demonstrate that common gradient descent techniques from machine learning may well be applicable on physical neuro-inspired analog computers. △ Less

Submitted 12 January, 2015; originally announced January 2015.

Journal ref: Journal of Machine Learning Research, vol. 16, pp. 2081-2097 (2015)

arXiv:1411.1398 [pdf, ps, other]

doi 10.1103/PhysRevE.91.020801

Reservoir computing with a single time-delay autonomous Boolean node

Authors: Nicholas D. Haynes, Miguel C. Soriano, David P. Rosin, Ingo Fischer, Daniel J. Gauthier

Abstract: We demonstrate reservoir computing with a physical system using a single autonomous Boolean logic element with time-delay feedback. The system generates a chaotic transient with a window of consistency lasting between 30 and 300 ns, which we show is sufficient for reservoir computing. We then characterize the dependence of computational performance on system parameters to find the best operating p… ▽ More We demonstrate reservoir computing with a physical system using a single autonomous Boolean logic element with time-delay feedback. The system generates a chaotic transient with a window of consistency lasting between 30 and 300 ns, which we show is sufficient for reservoir computing. We then characterize the dependence of computational performance on system parameters to find the best operating point of the reservoir. When the best parameters are chosen, the reservoir is able to classify short input patterns with performance that decreases over time. In particular, we show that four distinct input patterns can be classified for 70 ns, even though the inputs are only provided to the reservoir for 7.5 ns. △ Less

Submitted 30 January, 2015; v1 submitted 4 November, 2014; originally announced November 2014.

Comments: 5 pages, 5 figures

Journal ref: Physical Review E 91, 020801(R)(1-5) (2015)

Showing 1–43 of 43 results for author: Fischer, I