Skip to main content

Showing 1–17 of 17 results for author: Lajoie, G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.11617  [pdf, other

    cs.LG cs.AI stat.ML

    In-Context Parametric Inference: Point or Distribution Estimators?

    Authors: Sarthak Mittal, Yoshua Bengio, Nikolay Malkin, Guillaume Lajoie

    Abstract: Bayesian and frequentist inference are two fundamental paradigms in statistical estimation. Bayesian methods treat hypotheses as random variables, incorporating priors and updating beliefs via Bayes' theorem, whereas frequentist methods assume fixed but unknown hypotheses, relying on estimators like maximum likelihood. While extensive research has compared these approaches, the frequentist paradig… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  2. arXiv:2502.06601  [pdf, other

    cs.LG cs.AI stat.ML

    Amortized In-Context Bayesian Posterior Estimation

    Authors: Sarthak Mittal, Niels Leif Bracher, Guillaume Lajoie, Priyank Jaini, Marcus Brubaker

    Abstract: Bayesian inference provides a natural way of incorporating prior beliefs and assigning a probability measure to the space of hypotheses. Current solutions rely on iterative routines like Markov Chain Monte Carlo (MCMC) sampling and Variational Inference (VI), which need to be re-run whenever new observations are available. Amortization, through conditional estimation, is a viable strategy to allev… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  3. arXiv:2409.04434  [pdf, other

    cs.LG cs.AI stat.ML

    Accelerating Training with Neuron Interaction and Nowcasting Networks

    Authors: Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie, Eugene Belilovsky, Simon Lacoste-Julien

    Abstract: Neural network training can be accelerated when a learnable update rule is used in lieu of classic adaptive optimizers (e.g. Adam). However, learnable update rules can be costly and unstable to train and use. Recently, Jang et al. (2023) proposed a simpler approach to accelerate training based on weight nowcaster networks (WNNs). In their approach, Adam is used for most of the optimization steps a… ▽ More

    Submitted 27 February, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: ICLR 2025, code is https://github.com/SamsungSAILMontreal/nino

  4. arXiv:2407.00957  [pdf, other

    cs.NE q-bio.NC stat.ML

    Expressivity of Neural Networks with Random Weights and Learned Biases

    Authors: Ezekiel Williams, Alexandre Payeur, Avery Hee-Woon Ryoo, Thomas Jiralerspong, Matthew G. Perich, Luca Mazzucato, Guillaume Lajoie

    Abstract: Landmark universal function approximation results for neural networks with trained weights and biases provided the impetus for the ubiquitous use of neural networks as learning models in neuroscience and Artificial Intelligence (AI). Recent work has extended these results to networks in which a smaller subset of weights (e.g., output weights) are tuned, leaving other parameters random. However, it… ▽ More

    Submitted 21 March, 2025; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: upload of camera-ready manuscript accepted as poster at ICLR 2025; change of author order

  5. arXiv:2310.02423  [pdf, other

    cs.LG stat.ML

    Delta-AI: Local objectives for amortized inference in sparse graphical models

    Authors: Jean-Pierre Falet, Hae Beom Lee, Nikolay Malkin, Chen Sun, Dragos Secrieru, Thomas Jiralerspong, Dinghuai Zhang, Guillaume Lajoie, Yoshua Bengio

    Abstract: We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $Δ$-amortized inference ($Δ$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local… ▽ More

    Submitted 13 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; 19 pages, code: https://github.com/GFNOrg/Delta-AI/

  6. arXiv:2209.09658  [pdf, other

    cs.LG stat.ML

    Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty

    Authors: Thomas George, Guillaume Lajoie, Aristide Baratin

    Abstract: Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called lazy training regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficu… ▽ More

    Submitted 21 November, 2022; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: 25 pages, 14 figures

    Journal ref: TMLR 2022 - Transactions on Machine Learning Research, 12/2022

  7. arXiv:2112.03215  [pdf, other

    cs.LG cs.AI stat.ML

    Multi-scale Feature Learning Dynamics: Insights for Double Descent

    Authors: Mohammad Pezeshki, Amartya Mitra, Yoshua Bengio, Guillaume Lajoie

    Abstract: A key challenge in building theoretical foundations for deep learning is the complex optimization dynamics of neural networks, resulting from the high-dimensional interactions between the large number of network parameters. Such non-trivial dynamics lead to intriguing behaviors such as the phenomenon of "double descent" of the generalization error. The more commonly studied aspect of this phenomen… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  8. arXiv:2107.00848  [pdf, other

    stat.ML cs.LG

    Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

    Authors: Nan Rosemary Ke, Aniket Didolkar, Sarthak Mittal, Anirudh Goyal, Guillaume Lajoie, Stefan Bauer, Danilo Rezende, Yoshua Bengio, Michael Mozer, Christopher Pal

    Abstract: Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables,… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  9. arXiv:2102.00485  [pdf, other

    cs.LG stat.ML

    Exploring the Geometry and Topology of Neural Network Loss Landscapes

    Authors: Stefan Horoi, Jessie Huang, Bastian Rieck, Guillaume Lajoie, Guy Wolf, Smita Krishnaswamy

    Abstract: Recent work has established clear links between the generalization performance of trained neural networks and the geometry of their loss landscape near the local minima to which they converge. This suggests that qualitative and quantitative examination of the loss landscape geometry could yield insights about neural network generalization performance during training. To this end, researchers have… ▽ More

    Submitted 26 January, 2022; v1 submitted 31 January, 2021; originally announced February 2021.

    Comments: Accepted at the 20th Symposium on Intelligent Data Analysis (IDA) 2022

  10. arXiv:2011.09468  [pdf, other

    cs.LG math.DS stat.ML

    Gradient Starvation: A Learning Proclivity in Neural Networks

    Authors: Mohammad Pezeshki, Sékou-Oumar Kaba, Yoshua Bengio, Aaron Courville, Doina Precup, Guillaume Lajoie

    Abstract: We identify and formalize a fundamental gradient descent phenomenon resulting in a learning proclivity in over-parameterized neural networks. Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task, despite the presence of other predictive features that fail to be discovered. This work provides a theoretical explanation for the e… ▽ More

    Submitted 24 November, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: Proceeding of NeurIPS 2021

  11. arXiv:2008.00938  [pdf, other

    cs.LG stat.ML

    Implicit Regularization via Neural Feature Alignment

    Authors: Aristide Baratin, Thomas George, César Laurent, R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien

    Abstract: We approach the problem of implicit regularization in deep learning from a geometrical viewpoint. We highlight a regularization effect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions. This can be interpreted as a combined mechanism of feature selection and compression. By extrapolating a new analysis of Rad… ▽ More

    Submitted 16 March, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

    Comments: AISTATS 2021

  12. arXiv:2006.16981  [pdf, other

    cs.LG cs.NE stat.ML

    Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules

    Authors: Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

    Abstract: Robust perception relies on both bottom-up and top-down signals. Bottom-up signals consist of what's directly observed through sensation. Top-down signals consist of beliefs and expectations based on past experience and short-term memory, such as how the phrase `peanut butter and~...' will be completed. The optimal combination of bottom-up and top-down information remains an open question, but the… ▽ More

    Submitted 15 November, 2020; v1 submitted 30 June, 2020; originally announced June 2020.

    Comments: ICML 2020

  13. arXiv:2006.14123  [pdf, other

    cs.LG math.DS nlin.CD stat.ML

    On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools

    Authors: Ryan Vogt, Maximilian Puelma Touzel, Eli Shlizerman, Guillaume Lajoie

    Abstract: Recurrent neural networks (RNNs) have been successfully applied to a variety of problems involving sequential data, but their optimization is sensitive to parameter initialization, architecture, and optimizer hyperparameters. Considering RNNs as dynamical systems, a natural way to capture stability, i.e., the growth and decay over long iterates, are the Lyapunov Exponents (LEs), which form the Lya… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Associated github repository: https://github.com/shlizee/lyapunov-hyperopt

  14. arXiv:2006.12253  [pdf, other

    cs.LG cs.NE q-bio.NC stat.ML

    Advantages of biologically-inspired adaptive neural activation in RNNs during learning

    Authors: Victor Geadah, Giancarlo Kerg, Stefan Horoi, Guy Wolf, Guillaume Lajoie

    Abstract: Dynamic adaptation in single-neuron response plays a fundamental role in neural coding in biological neural networks. Yet, most neural activation functions used in artificial networks are fixed and mostly considered as an inconsequential architecture choice. In this paper, we investigate nonlinear activation function adaptation over the large time scale of learning, and outline its impact on seque… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  15. arXiv:2006.09471  [pdf, other

    cs.LG stat.ML

    Untangling tradeoffs between recurrence and self-attention in neural networks

    Authors: Giancarlo Kerg, Bhargav Kanuparthi, Anirudh Goyal, Kyle Goyette, Yoshua Bengio, Guillaume Lajoie

    Abstract: Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimization and computation, and rely on considerable memory and computational resources that scale poorly. In this work, we present a formal analysis of how self-attenti… ▽ More

    Submitted 10 December, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

  16. arXiv:1906.00443  [pdf, other

    cs.LG stat.ML

    Dimensionality compression and expansion in Deep Neural Networks

    Authors: Stefano Recanatesi, Matthew Farrell, Madhu Advani, Timothy Moore, Guillaume Lajoie, Eric Shea-Brown

    Abstract: Datasets such as images, text, or movies are embedded in high-dimensional spaces. However, in important cases such as images of objects, the statistical structure in the data constrains samples to a manifold of dramatically lower dimensionality. Learning to identify and extract task-relevant variables from this embedded manifold is crucial when dealing with high-dimensional problems. We find that… ▽ More

    Submitted 27 October, 2019; v1 submitted 2 June, 2019; originally announced June 2019.

    Comments: Submitted to NeurIPS 2019. First two authors contributed equally

  17. arXiv:1905.12080  [pdf, other

    cs.LG cs.AI stat.ML

    Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

    Authors: Giancarlo Kerg, Kyle Goyette, Maximilian Puelma Touzel, Gauthier Gidel, Eugene Vorontsov, Yoshua Bengio, Guillaume Lajoie

    Abstract: A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary. This ensures eigenvalues with unit norm and thus stable dynamics and training. However this comes at the cost of reduced expressivity due to the limited variety of ort… ▽ More

    Submitted 28 October, 2019; v1 submitted 28 May, 2019; originally announced May 2019.