Skip to main content

Showing 1–6 of 6 results for author: Valle-Pérez, G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2304.06670  [pdf, other

    cs.LG cs.AI stat.ML

    Do deep neural networks have an inbuilt Occam's razor?

    Authors: Chris Mingard, Henry Rees, Guillermo Valle-Pérez, Ard A. Louis

    Abstract: The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  2. arXiv:2102.07238  [pdf, other

    stat.ML cs.LG

    Double-descent curves in neural networks: a new perspective using Gaussian processes

    Authors: Ouns El Harzli, Bernardo Cuenca Grau, Guillermo Valle-Pérez, Ard A. Louis

    Abstract: Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribut… ▽ More

    Submitted 25 May, 2023; v1 submitted 14 February, 2021; originally announced February 2021.

  3. arXiv:2012.04115  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Generalization bounds for deep learning

    Authors: Guillermo Valle-Pérez, Ard A. Louis

    Abstract: Generalization in deep learning has been the topic of much recent theoretical and empirical research. Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning. Such predictions should 1) scale correctly with data complexity; 2) scale correctly with training set size; 3) capture differences between architectures; 4) capture diffe… ▽ More

    Submitted 9 December, 2020; v1 submitted 7 December, 2020; originally announced December 2020.

  4. arXiv:2006.15191  [pdf, other

    cs.LG stat.ML

    Is SGD a Bayesian sampler? Well, almost

    Authors: Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis

    Abstract: Overparameterised deep neural networks (DNNs) are highly expressive and so can, in principle, generate almost any function that fits a training dataset with zero error. The vast majority of these functions will perform poorly on unseen data, and yet in practice DNNs often generalise remarkably well. This success suggests that a trained DNN must have a strong inductive bias towards functions with l… ▽ More

    Submitted 24 October, 2020; v1 submitted 26 June, 2020; originally announced June 2020.

    Journal ref: Journal of Machine Learning Research, 22 79 (2021), 1-64

  5. arXiv:1909.11522  [pdf, other

    cs.LG stat.ML

    Neural networks are a priori biased towards Boolean functions with low entropy

    Authors: Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis

    Abstract: Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies t points… ▽ More

    Submitted 2 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

  6. arXiv:1805.08522  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Deep learning generalizes because the parameter-function map is biased towards simple functions

    Authors: Guillermo Valle-Pérez, Chico Q. Camargo, Ard A. Louis

    Abstract: Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly… ▽ More

    Submitted 21 April, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: Published as a conference paper at ICLR 2019