Skip to main content

Showing 1–50 of 76 results for author: Courville, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.02679  [pdf, other

    cs.LG cs.AI stat.CO stat.ME stat.ML

    Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

    Authors: Dinghuai Zhang, Ricky T. Q. Chen, Cheng-Hao Liu, Aaron Courville, Yoshua Bengio

    Abstract: We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajector… ▽ More

    Submitted 9 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024

  2. arXiv:2305.17010  [pdf, other

    cs.LG cs.AI cs.DM stat.ML

    Let the Flows Tell: Solving Graph Combinatorial Optimization Problems with GFlowNets

    Authors: Dinghuai Zhang, Hanjun Dai, Nikolay Malkin, Aaron Courville, Yoshua Bengio, Ling Pan

    Abstract: Combinatorial optimization (CO) problems are often NP-hard and thus out of reach for exact algorithms, making them a tempting domain to apply machine learning methods. The highly structured constraints in these problems can hinder either optimization or sampling directly in the solution space. On the other hand, GFlowNets have recently emerged as a powerful machinery to efficiently sample from com… ▽ More

    Submitted 20 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted by NeurIPS 2023 as spotlight

  3. arXiv:2302.05793  [pdf, other

    cs.LG cs.AI stat.CO stat.ML

    Distributional GFlowNets with Quantile Flows

    Authors: Dinghuai Zhang, Ling Pan, Ricky T. Q. Chen, Aaron Courville, Yoshua Bengio

    Abstract: Generative Flow Networks (GFlowNets) are a new family of probabilistic samplers where an agent learns a stochastic policy for generating complex combinatorial structure through a series of decision-making steps. Despite being inspired from reinforcement learning, the current GFlowNet framework is relatively limited in its applicability and cannot handle stochasticity in the reward function. In thi… ▽ More

    Submitted 17 February, 2024; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted by TMLR

  4. arXiv:2302.00695  [pdf, other

    cs.LG hep-ex hep-ph stat.ML

    Versatile Energy-Based Probabilistic Models for High Energy Physics

    Authors: Taoli Cheng, Aaron Courville

    Abstract: As a classical generative modeling approach, energy-based models have the natural advantage of flexibility in the form of the energy function. Recently, energy-based models have achieved great success in modeling high-dimensional data in computer vision and natural language processing. In line with these advancements, we build a multi-purpose energy-based probabilistic model for High Energy Physic… ▽ More

    Submitted 18 January, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: 17 pages, 9 figures. NeurIPS 2023 camera ready

  5. arXiv:2210.00999  [pdf, other

    cs.LG cs.AI stat.ML

    Latent State Marginalization as a Low-cost Approach for Improving Exploration

    Authors: Dinghuai Zhang, Aaron Courville, Yoshua Bengio, Qinqing Zheng, Amy Zhang, Ricky T. Q. Chen

    Abstract: While the maximum entropy (MaxEnt) reinforcement learning (RL) framework -- often touted for its exploration and robustness capabilities -- is usually motivated from a probabilistic perspective, the use of deep probabilistic models has not gained much traction in practice due to their inherent complexity. In this work, we propose the adoption of latent variable policies within the MaxEnt framework… ▽ More

    Submitted 10 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

    Comments: Accepted by ICLR 2023

  6. arXiv:2206.03362  [pdf, other

    cs.LG cs.AI cs.CR stat.ME stat.ML

    Building Robust Ensembles via Margin Boosting

    Authors: Dinghuai Zhang, Hongyang Zhang, Aaron Courville, Yoshua Bengio, Pradeep Ravikumar, Arun Sai Suggala

    Abstract: In the context of adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks, and as a result, has sub-optimal robustness. Consequently, an emerging line of work has focused on learning an ensemble of neural networks to defend against adversarial attacks. In this work, we take a principled approach towards building robust ensembles.… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: Accepted by ICML 2022

  7. arXiv:2206.01626  [pdf, other

    cs.LG cs.AI stat.ML

    Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from s… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Code and agents at https://agarwl.github.io/reincarnating_rl

  8. arXiv:2205.07802  [pdf, other

    cs.LG cs.AI stat.ML

    The Primacy Bias in Deep Reinforcement Learning

    Authors: Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

    Abstract: This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effec… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

    Comments: ICML 2022; code at https://github.com/evgenii-nikishin/rl_with_resets

  9. arXiv:2202.01361  [pdf, other

    cs.LG stat.ML

    Generative Flow Networks for Discrete Probabilistic Modeling

    Authors: Dinghuai Zhang, Nikolay Malkin, Zhen Liu, Alexandra Volokhova, Aaron Courville, Yoshua Bengio

    Abstract: We present energy-based generative flow networks (EB-GFN), a novel probabilistic modeling algorithm for high-dimensional discrete data. Building upon the theory of generative flow networks (GFlowNets), we model the generation process by a stochastic data construction policy and thus amortize expensive MCMC exploration into a fixed number of actions sampled from a GFlowNet. We show how GFlowNets ca… ▽ More

    Submitted 8 June, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Accepted by ICML 2022

  10. arXiv:2110.03372  [pdf, other

    cs.LG cs.AI q-bio.BM stat.ME stat.ML

    Unifying Likelihood-free Inference with Black-box Optimization and Beyond

    Authors: Dinghuai Zhang, Jie Fu, Yoshua Bengio, Aaron Courville

    Abstract: Black-box optimization formulations for biological sequence design have drawn recent attention due to their promising potential impact on the pharmaceutical industry. In this work, we propose to unify two seemingly distinct worlds: likelihood-free inference and black-box optimization, under one probabilistic framework. In tandem, we provide a recipe for constructing various sequence design methods… ▽ More

    Submitted 8 February, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: ICLR 2022 spotlight

  11. arXiv:2108.13264  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Deep Reinforcement Learning at the Edge of the Statistical Precipice

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Lea… ▽ More

    Submitted 5 January, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Outstanding Paper Award at NeurIPS 2021. Website: https://agarwl.github.io/rliable. 28 Pages, 33 Figures

  12. arXiv:2106.02890  [pdf, other

    cs.LG stat.ML

    Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

    Authors: Dinghuai Zhang, Kartik Ahuja, Yilun Xu, Yisen Wang, Aaron Courville

    Abstract: Can models with particular structure avoid being biased towards spurious correlation in out-of-distribution (OOD) generalization? Peters et al. (2016) provides a positive answer for linear cases. In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting. We demonstrate that even in biased models (which focus on spurious correlation) there still ex… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

    Comments: Accepted to ICML2021 as long talk

  13. arXiv:2011.09468  [pdf, other

    cs.LG math.DS stat.ML

    Gradient Starvation: A Learning Proclivity in Neural Networks

    Authors: Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie

    Abstract: We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the e… ▽ More

    Submitted 24 November, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: Proceeding of NeurIPS 2021

  14. arXiv:2010.10079  [pdf, other

    stat.ML cs.AI cs.LG stat.AP

    Neural Approximate Sufficient Statistics for Implicit Models

    Authors: Yanzhi Chen, Dinghuai Zhang, Michael Gutmann, Aaron Courville, Zhanxing Zhu

    Abstract: We consider the fundamental problem of how to automatically construct summary statistics for implicit generative models where the evaluation of the likelihood function is intractable, but sampling data from the model is possible. The idea is to frame the task of constructing sufficient statistics as learning mutual information maximizing representations of the data with the help of deep neural net… ▽ More

    Submitted 30 March, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: ICLR2021 spotlight

  15. arXiv:2010.01262  [pdf, other

    cs.LG stat.ML

    Integrating Categorical Semantics into Unsupervised Domain Translation

    Authors: Samuel Lavoie, Faruk Ahmed, Aaron Courville

    Abstract: While unsupervised domain translation (UDT) has seen a lot of success recently, we argue that mediating its translation via categorical semantic features could broaden its applicability. In particular, we demonstrate that categorical semantics improves the translation between perceptually different domains sharing multiple object categories. We propose a method to learn, in an unsupervised manner,… ▽ More

    Submitted 16 March, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: 22 pages. In submission to the International Conference on Learning Representation (ICLR) 2021

  16. arXiv:2007.05929  [pdf, other

    cs.LG stat.ML

    Data-Efficient Reinforcement Learning with Self-Predictive Representations

    Authors: Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, Philip Bachman

    Abstract: While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interaction with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential intera… ▽ More

    Submitted 20 May, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: The first two authors contributed equally to this work. v4 includes new ablations and reformatting for ICLR camera ready

  17. arXiv:2007.05756  [pdf, other

    cs.CV cs.LG stat.ML

    Generative Compositional Augmentations for Scene Graph Prediction

    Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

    Abstract: Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the… ▽ More

    Submitted 1 October, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

    Comments: ICCV 2021 camera ready. Added more baselines, combining GANs with Neural Motifs and t-sne visualizations. Code is available at https://github.com/bknyaz/sgg

  18. arXiv:2006.05164  [pdf, other

    cs.LG stat.ML

    AR-DAE: Towards Unbiased Neural Entropy Gradient Estimation

    Authors: Jae Hyun Lim, Aaron Courville, Christopher Pal, Chin-Wei Huang

    Abstract: Entropy is ubiquitous in machine learning, but it is in general intractable to compute the entropy of the distribution of an arbitrary continuous random variable. In this paper, we propose the amortized residual denoising autoencoder (AR-DAE) to approximate the gradient of the log density function, which can be used to estimate the gradient of entropy. Amortization allows us to significantly reduc… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

    Comments: accepted in ICML 2020

  19. arXiv:2003.14166  [pdf, other

    cs.CV cs.LG stat.ML

    Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation

    Authors: Sai Rajeswar, Fahim Mannan, Florian Golemo, Jérôme Parent-Lévesque, David Vazquez, Derek Nowrouzezahrai, Aaron Courville

    Abstract: We infer and generate three-dimensional (3D) scene information from a single input image and without supervision. This problem is under-explored, with most prior work relying on supervision from, e.g., 3D ground-truth, multiple images of a scene, image silhouettes or key-points. We propose Pix2Shape, an approach to solve this problem with four components: (i) an encoder that infers the latent 3D r… ▽ More

    Submitted 17 April, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

    Comments: This is a pre-print of an article published in International Journal of Computer Vision. The final authenticated version is available online at: https://doi.org/10.1007/s11263-020-01322-1

    Journal ref: International Journal of Computer Vision, (2020), 1-16

  20. arXiv:2003.00688  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Out-of-Distribution Generalization via Risk Extrapolation (REx)

    Authors: David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, Aaron Courville

    Abstract: Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world. To tackle this problem, we assume that variation across training domains is representative of the variation we might encounter at test time, but also that shifts at test time may be more extreme in magnitude. In particular, we show that reducing differences in ri… ▽ More

    Submitted 25 February, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

  21. arXiv:2002.07101  [pdf, other

    cs.LG stat.ML

    Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models

    Authors: Chin-Wei Huang, Laurent Dinh, Aaron Courville

    Abstract: In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drastically increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performanc… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

    Comments: 27 pages, 12 figures

  22. arXiv:1911.05248  [pdf, other

    cs.LG cs.AI cs.CV cs.HC stat.ML

    What Do Compressed Deep Neural Networks Forget?

    Authors: Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome

    Abstract: Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weight… ▽ More

    Submitted 5 September, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

  23. arXiv:1910.09570  [pdf, other

    q-bio.QM cs.CV eess.SP stat.AP stat.ML

    Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

    Authors: Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

    Abstract: We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-super… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: Under Review

  24. arXiv:1908.02388  [pdf, other

    cs.LG stat.ML

    Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

    Authors: Adrien Ali Taïga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare

    Abstract: This paper provides an empirical evaluation of recently developed exploration algorithms within the Arcade Learning Environment (ALE). We study the use of different reward bonuses that incentives exploration in reinforcement learning. We do so by fixing the learning algorithm used and focusing only on the impact of the different exploration bonuses in the agent's performance. We use Rainbow, the s… ▽ More

    Submitted 24 September, 2021; v1 submitted 6 August, 2019; originally announced August 2019.

    Comments: Accepted at the second Exploration in Reinforcement Learning Workshop at the 36th International Conference on Machine Learning, Long Beach, California. The full version arxiv.org/abs/2109.11052 was published as a conference paper at ICLR 2020

  25. arXiv:1906.09691  [pdf, other

    cs.LG stat.ML

    Adversarial Computation of Optimal Transport Maps

    Authors: Jacob Leygonie, Jennifer She, Amjad Almahairi, Sai Rajeswar, Aaron Courville

    Abstract: Computing optimal transport maps between high-dimensional and continuous distributions is a challenging problem in optimal transport (OT). Generative adversarial networks (GANs) are powerful generative models which have been successfully applied to learn maps across high-dimensional domains. However, little is known about the nature of the map learned with a GAN objective. To address this problem,… ▽ More

    Submitted 23 June, 2019; originally announced June 2019.

  26. arXiv:1906.04282  [pdf, other

    cs.LG stat.ML

    Stochastic Neural Network with Kronecker Flow

    Authors: Chin-Wei Huang, Ahmed Touati, Pascal Vincent, Gintare Karolina Dziugaite, Alexandre Lacoste, Aaron Courville

    Abstract: Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this w… ▽ More

    Submitted 13 February, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

    Comments: Proceedings of the 23rdInternational Conference on ArtificialIntelligence and Statistics (AISTATS) 2020

  27. arXiv:1906.03708  [pdf, other

    cs.LG stat.ML

    Note on the bias and variance of variational inference

    Authors: Chin-Wei Huang, Aaron Courville

    Abstract: In this note, we study the relationship between the variational gap and the variance of the (log) likelihood ratio. We show that the gap can be upper bounded by some form of dispersion measure of the likelihood ratio, which suggests the bias of variational inference can be reduced by making the distribution of the likelihood ratio more concentrated, such as via averaging and variance reduction.

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: 5 pages

  28. arXiv:1905.12760  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Batch weight for domain adaptation with mass shift

    Authors: Mikołaj Bińkowski, R Devon Hjelm, Aaron Courville

    Abstract: Unsupervised domain transfer is the task of transferring or translating samples from a source distribution to a different target distribution. Current solutions unsupervised domain transfer often operate on data on which the modes of the distribution are well-matched, for instance have the same frequencies of classes between source and target distributions. However, these models do not perform wel… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

  29. arXiv:1905.04866  [pdf, other

    cs.LG stat.ML

    Hierarchical Importance Weighted Autoencoders

    Authors: Chin-Wei Huang, Kris Sankaran, Eeshan Dhekane, Alexandre Lacoste, Aaron Courville

    Abstract: Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We believe a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the varia… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

    Comments: Accepted by ICML 2019. 17 pages

  30. arXiv:1903.07227  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Counterpoint by Convolution

    Authors: Cheng-Zhi Anna Huang, Tim Cooijmans, Adam Roberts, Aaron Courville, Douglas Eck

    Abstract: Machine learning models of music typically break up the task of composition into a chronological process, composing a piece of music in a single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. In order to better approximate this process, we train a convolutional neural netwo… ▽ More

    Submitted 17 March, 2019; originally announced March 2019.

    Comments: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017

    ACM Class: H.5.5; I.2

  31. arXiv:1901.08508  [pdf, other

    cs.LG cs.AI stat.ML

    Maximum Entropy Generators for Energy-Based Models

    Authors: Rithesh Kumar, Sherjil Ozair, Anirudh Goyal, Aaron Courville, Yoshua Bengio

    Abstract: Maximum likelihood estimation of energy-based models is a challenging problem due to the intractability of the log-likelihood gradient. In this work, we propose learning both the energy function and an amortized approximate sampling mechanism using a neural generator network, which provides an efficient approximation of the log-likelihood gradient. The resulting objective requires maximizing entro… ▽ More

    Submitted 27 May, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

  32. arXiv:1811.10097  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Planning in Dynamic Environments with Conditional Autoregressive Models

    Authors: Johanna Hansen, Kyle Kastner, Aaron Courville, Gregory Dudek

    Abstract: We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oord et al., 2017b) for forward planning with MCTS. In order to test this method, we introduce a new environment featuring varying difficulty levels, along with moving goals and obstacles. The combination of high-quality frame generation and classical planning a… ▽ More

    Submitted 25 November, 2018; originally announced November 2018.

    Comments: 6 pages, 1 figure, in Proceedings of the Prediction and Generative Modeling in Reinforcement Learning Workshop at the International Conference on Machine Learning (ICML) in 2018

  33. arXiv:1811.07426  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Harmonic Recomposition using Conditional Autoregressive Modeling

    Authors: Kyle Kastner, Rithesh Kumar, Tim Cooijmans, Aaron Courville

    Abstract: We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(2017). Recomposition (Casal & Casey, 2010) focuses on reworking existing musical pieces, adhering to structure at a high level while also re-imagining other aspects of the work. This can involve reuse of pre-existing themes or parts of the original piece, while… ▽ More

    Submitted 18 November, 2018; originally announced November 2018.

    Comments: 3 pages, 2 figures. In Proceedings of The Joint Workshop on Machine Learning for Music, ICML 2018

  34. arXiv:1811.07240  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    Representation Mixing for TTS Synthesis

    Authors: Kyle Kastner, João Felipe Santos, Yoshua Bengio, Aaron Courville

    Abstract: Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. However, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information… ▽ More

    Submitted 24 November, 2018; v1 submitted 17 November, 2018; originally announced November 2018.

    Comments: 5 pages, 3 figures

  35. arXiv:1809.06848  [pdf, other

    cs.LG cs.AI stat.ML

    On the Learning Dynamics of Deep Neural Networks

    Authors: Remi Tachet, Mohammad Pezeshki, Samira Shabanian, Aaron Courville, Yoshua Bengio

    Abstract: While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm emp… ▽ More

    Submitted 11 December, 2020; v1 submitted 18 September, 2018; originally announced September 2018.

    Comments: 19 pages, 7 figures

  36. arXiv:1809.01818  [pdf, other

    cs.LG stat.ML

    Improving Explorability in Variational Inference with Annealed Variational Objectives

    Authors: Chin-Wei Huang, Shawn Tan, Alexandre Lacoste, Aaron Courville

    Abstract: Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can still limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Import… ▽ More

    Submitted 25 October, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: To appear in NIPS 2018

  37. arXiv:1808.09819  [pdf, other

    cs.LG cs.AI stat.ML

    Approximate Exploration through State Abstraction

    Authors: Adrien Ali Taïga, Aaron Courville, Marc G. Bellemare

    Abstract: Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exp… ▽ More

    Submitted 24 January, 2019; v1 submitted 29 August, 2018; originally announced August 2018.

  38. arXiv:1808.04446  [pdf, other

    cs.CV cs.CL cs.LG stat.ML

    Visual Reasoning with Multi-hop Feature Modulation

    Authors: Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

    Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to… ▽ More

    Submitted 12 October, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: In Proc of ECCV 2018

  39. arXiv:1806.08734  [pdf, other

    stat.ML cs.LG

    On the Spectral Bias of Neural Networks

    Authors: Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

    Abstract: Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuatio… ▽ More

    Submitted 31 May, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: 23 pages

    Journal ref: ICML 2019

  40. Learning Distributed Representations from Reviews for Collaborative Filtering

    Authors: Amjad Almahairi, Kyle Kastner, Kyunghyun Cho, Aaron Courville

    Abstract: Recent work has shown that collaborative filter-based recommender systems can be improved by incorporating side information, such as natural language reviews, as a way of regularizing the derived product representations. Motivated by the success of this approach, we introduce two different models of reviews and study their effect on collaborative filtering performance. While the previous state-of-… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: Published in RecSys 2015 conference

  41. arXiv:1806.05236  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Manifold Mixup: Better Representations by Interpolating Hidden States

    Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

    Abstract: Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden repr… ▽ More

    Submitted 11 May, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: To appear in ICML 2019

  42. arXiv:1804.00779  [pdf, other

    cs.LG stat.ML

    Neural Autoregressive Flows

    Authors: Chin-Wei Huang, David Krueger, Alexandre Lacoste, Aaron Courville

    Abstract: Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

    Comments: 16 pages, 10 figures, 3 tables

  43. arXiv:1802.01071  [pdf, other

    stat.ML cs.LG

    Hierarchical Adversarially Learned Inference

    Authors: Mohamed Ishmael Belghazi, Sai Rajeswar, Olivier Mastropietro, Negar Rostamzadeh, Jovana Mitrovic, Aaron Courville

    Abstract: We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with diffe… ▽ More

    Submitted 3 February, 2018; originally announced February 2018.

    Comments: 18 pages, 7 figures

  44. arXiv:1801.04062  [pdf, other

    cs.LG stat.ML

    MINE: Mutual Information Neural Estimation

    Authors: Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, R Devon Hjelm

    Abstract: We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be… ▽ More

    Submitted 14 August, 2021; v1 submitted 12 January, 2018; originally announced January 2018.

    Comments: 19 pages, 6 figures

    Journal ref: ICML 2018

  45. arXiv:1712.04120  [pdf, other

    stat.ML cs.LG

    GibbsNet: Iterative Adversarial Inference for Deep Graphical Models

    Authors: Alex Lamb, Devon Hjelm, Yaroslav Ganin, Joseph Paul Cohen, Aaron Courville, Yoshua Bengio

    Abstract: Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet samp… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

    Comments: NIPS 2017

  46. arXiv:1710.04759  [pdf, other

    stat.ML cs.AI cs.LG

    Bayesian Hypernetworks

    Authors: David Krueger, Chin-Wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville

    Abstract: We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork $\h$ is a neural network which learns to transform a simple noise distribution, $p(\vecε) = \N(\vec 0,\mat I)$, to a distribution $q(\pp) := q(h(\vecε))$ over the parameters $\pp$ of another neural network (the "primary network")\@. We train $q$ with variational inference, us… ▽ More

    Submitted 24 April, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

    Comments: David Krueger and Chin-Wei Huang contributed equally

  47. arXiv:1710.02248  [pdf, other

    cs.LG cs.AI stat.ML

    Learnable Explicit Density for Continuous Latent Space and Variational Inference

    Authors: Chin-Wei Huang, Ahmed Touati, Laurent Dinh, Michal Drozdzal, Mohammad Havaei, Laurent Charlin, Aaron Courville

    Abstract: In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF)… ▽ More

    Submitted 5 October, 2017; originally announced October 2017.

    Comments: 2 figures, 5 pages, submitted to ICML Principled Approaches to Deep Learning workshop

  48. arXiv:1709.07871  [pdf, other

    cs.CV cs.AI cs.CL stat.ML

    FiLM: Visual Reasoning with a General Conditioning Layer

    Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

    Abstract: We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process -… ▽ More

    Submitted 18 December, 2017; v1 submitted 22 September, 2017; originally announced September 2017.

    Comments: AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.03017

  49. arXiv:1707.03017  [pdf, other

    cs.CV cs.AI cs.CL stat.ML

    Learning Visual Reasoning Without Strong Priors

    Authors: Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville

    Abstract: Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than lear… ▽ More

    Submitted 18 December, 2017; v1 submitted 10 July, 2017; originally announced July 2017.

    Comments: Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/film

  50. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work