Skip to main content

Showing 1–6 of 6 results for author: Milsom, E

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.17405  [pdf, other

    stat.ML cs.LG

    Function-Space Learning Rates

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: We consider layerwise function-space learning rates, which measure the magnitude of the change in a neural network's output function in response to an update to a parameter tensor. This contrasts with traditional learning rates, which describe the magnitude of changes in parameter space. We develop efficient methods to measure and set function-space learning rates in arbitrary neural networks, req… ▽ More

    Submitted 22 May, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: ICML 2025 Camera Ready Version, 27 pages, 26 figures

  2. arXiv:2410.06171  [pdf, other

    stat.ML cs.LG

    Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10 using a ResNet-inspired architecture, which is SOTA for kernel methods. However, this still lags behind neural networks, which easily achieve over 94% test accuracy with similar architectures. In this work we introduce several modifications to improve the convolutional deep kernel machine's generali… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Neurips 2024 Camera Ready Version (without checklist)

  3. arXiv:2402.06525  [pdf, other

    stat.ML cs.LG

    Flexible infinite-width graph convolutional networks and the importance of representation learning

    Authors: Ben Anson, Edward Milsom, Laurence Aitchison

    Abstract: A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed, and tunable only through a small number of hyperparameters, eliminating any possibility of representation learning. This contrasts with fi… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  4. arXiv:2309.09814  [pdf, ps, other

    stat.ML cs.LG

    Convolutional Deep Kernel Machines

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found… ▽ More

    Submitted 26 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 Camera Ready Version

  5. arXiv:2305.14454  [pdf, other

    stat.ML cs.LG

    An Improved Variational Approximate Posterior for the Deep Wishart Process

    Authors: Sebastian Ober, Ben Anson, Edward Milsom, Laurence Aitchison

    Abstract: Deep kernel processes are a recently introduced class of deep Bayesian models that have the flexibility of neural networks, but work entirely with Gram matrices. They operate by alternately sampling a Gram matrix from a distribution over positive semi-definite matrices, and applying a deterministic transformation. When the distribution is chosen to be Wishart, the model is called a deep Wishart pr… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  6. arXiv:2108.13097  [pdf, other

    stat.ML cs.LG

    A theory of representation learning gives a deep generalisation of kernel methods

    Authors: Adam X. Yang, Maxime Robeyns, Edward Milsom, Ben Anson, Nandi Schoots, Laurence Aitchison

    Abstract: The successes of modern deep machine learning methods are founded on their ability to transform inputs across multiple layers to build good high-level representations. It is therefore critical to understand this process of representation learning. However, standard theoretical approaches (formally NNGPs) involving infinite width limits eliminate representation learning. We therefore develop a new… ▽ More

    Submitted 25 May, 2023; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Published in ICML 2023