Skip to main content

Showing 1–50 of 65 results for author: Aitchison, L

.
  1. arXiv:2506.02211  [pdf, ps, other

    cs.AI

    Improving LLM-Generated Code Quality with GRPO

    Authors: Maxime Robeyns, Laurence Aitchison

    Abstract: Large Language Models (LLMs) are gaining widespread use for code generation. Recent training procedures use execution feedback as a reward signal, typically focusing on the functional correctness of the code, using unit test pass rate as a reward signal. However, this reward signal fails to capture notions of maintainability, quality and safety of the code produced. We address this under-explored… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  2. arXiv:2505.17083  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Scale-invariant Attention

    Authors: Ben Anson, Xi Wang, Laurence Aitchison

    Abstract: One persistent challenge in LLM research is the development of attention mechanisms that are able to generalise from training on shorter contexts to inference on longer contexts. We propose two conditions that we expect all effective long context attention mechanisms to have: scale-invariant total attention, and scale-invariant attention sparsity. Under a Gaussian assumption, we show that a simple… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Preprint

  3. arXiv:2504.15228  [pdf, other

    cs.AI

    A Self-Improving Coding Agent

    Authors: Maxime Robeyns, Martin Szummer, Laurence Aitchison

    Abstract: Recent advancements in Large Language Models (LLMs) have spurred interest in deploying LLM agents to undertake tasks in the world. LLMs are often deployed in agent systems: code that orchestrates LLM calls and provides them with tools. We demonstrate that an agent system, equipped with basic coding tools, can autonomously edit itself, and thereby improve its performance on benchmark tasks. We find… ▽ More

    Submitted 16 May, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: Submitted as a preprint to NeurIPS 2025

  4. arXiv:2503.08264  [pdf, other

    stat.ML cs.LG

    Massively Parallel Expectation Maximization For Approximate Posteriors

    Authors: Thomas Heap, Sam Bowyer, Laurence Aitchison

    Abstract: Bayesian inference for hierarchical models can be very challenging. MCMC methods have difficulty scaling to large models with many observations and latent variables. While variational inference (VI) and reweighted wake-sleep (RWS) can be more scalable, they are gradient-based methods and so often require many iterations to converge. Our key insight was that modern massively parallel importance wei… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  5. arXiv:2503.01747  [pdf, other

    cs.AI cs.LG stat.ML

    Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints

    Authors: Sam Bowyer, Laurence Aitchison, Desi R. Ivanova

    Abstract: Rigorous statistical evaluations of large language models (LLMs), including valid error bars and significance testing, are essential for meaningful and reliable performance assessment. Currently, when such statistical measures are reported, they typically rely on the Central Limit Theorem (CLT). In this position paper, we argue that while CLT-based methods for uncertainty quantification are approp… ▽ More

    Submitted 28 May, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: 42 pages, 39 figures. ICML 2025 Spotlight Position Paper

  6. arXiv:2502.18147  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

    Authors: Lucy Farnik, Tim Lawson, Conor Houghton, Laurence Aitchison

    Abstract: Sparse autoencoders (SAEs) have been successfully used to discover sparse and human-interpretable representations of the latent activations of LLMs. However, we would ultimately like to understand the computations performed by LLMs and not just their representations. The extent to which SAEs can help us understand computations is unclear because they are not designed to "sparsify" computations in… ▽ More

    Submitted 6 June, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  7. arXiv:2502.17405  [pdf, other

    stat.ML cs.LG

    Function-Space Learning Rates

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: We consider layerwise function-space learning rates, which measure the magnitude of the change in a neural network's output function in response to an update to a parameter tensor. This contrasts with traditional learning rates, which describe the magnitude of changes in parameter space. We develop efficient methods to measure and set function-space learning rates in arbitrary neural networks, req… ▽ More

    Submitted 22 May, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: ICML 2025 Camera Ready Version, 27 pages, 26 figures

  8. arXiv:2501.17727  [pdf, other

    cs.LG

    Sparse Autoencoders Can Interpret Randomly Initialized Transformers

    Authors: Thomas Heap, Tim Lawson, Lucy Farnik, Laurence Aitchison

    Abstract: Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the internal representations of transformers. In this paper, we apply SAEs to 'interpret' random transformers, i.e., transformers where the parameters are sampled IID from a Gaussian rather than trained on text data. We find that random and trained transformers produce similarly interpretable SAE latents, and we conf… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  9. arXiv:2411.14478  [pdf, ps, other

    cs.LG

    Why you don't overfit, and don't need Bayes if you only train for one epoch

    Authors: Laurence Aitchison

    Abstract: Here, we show that in the data-rich setting where you only train on each datapoint once (or equivalently, you only train for one epoch), standard "maximum likelihood" training optimizes the true data generating process (DGP) loss, which is equivalent to the test loss. Further, we show that the Bayesian model average optimizes the same objective, albeit while taking the expectation over uncertainty… ▽ More

    Submitted 19 November, 2024; originally announced November 2024.

  10. arXiv:2411.00489  [pdf, other

    cs.AI

    Human-inspired Perspectives: A Survey on AI Long-term Memory

    Authors: Zihong He, Weizhe Lin, Hao Zheng, Fan Zhang, Matt W. Jones, Laurence Aitchison, Xuhai Xu, Miao Liu, Per Ola Kristensson, Junxiao Shen

    Abstract: With the rapid advancement of AI systems, their abilities to store, retrieve, and utilize information over the long term - referred to as long-term memory - have become increasingly significant. These capabilities are crucial for enhancing the performance of AI systems across a wide range of tasks. However, there is currently no comprehensive survey that systematically investigates AI's long-term… ▽ More

    Submitted 12 January, 2025; v1 submitted 1 November, 2024; originally announced November 2024.

  11. arXiv:2410.06171  [pdf, other

    stat.ML cs.LG

    Stochastic Kernel Regularisation Improves Generalisation in Deep Kernel Machines

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: Recent work developed convolutional deep kernel machines, achieving 92.7% test accuracy on CIFAR-10 using a ResNet-inspired architecture, which is SOTA for kernel methods. However, this still lags behind neural networks, which easily achieve over 94% test accuracy with similar architectures. In this work we introduce several modifications to improve the convolutional deep kernel machine's generali… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Neurips 2024 Camera Ready Version (without checklist)

  12. arXiv:2409.04185  [pdf, other

    cs.LG cs.CL

    Residual Stream Analysis with Multi-Layer SAEs

    Authors: Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison

    Abstract: Sparse autoencoders (SAEs) are a promising approach to interpreting the internal representations of transformer language models. However, SAEs are usually trained separately on each transformer layer, making it difficult to use them to study how information flows across layers. To solve this problem, we introduce the multi-layer SAE (MLSAE): a single SAE trained on the residual stream activation v… ▽ More

    Submitted 24 February, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: ICLR 2025 Camera Ready. 45 pages, 41 figures

  13. arXiv:2407.14158  [pdf, other

    physics.ao-ph cs.LG

    Machine learning emulation of precipitation from km-scale regional climate simulations using a diffusion model

    Authors: Henry Addison, Elizabeth Kendon, Suman Ravuri, Laurence Aitchison, Peter AG Watson

    Abstract: High-resolution climate simulations are valuable for understanding climate change impacts. This has motivated use of regional convection-permitting climate models (CPMs), but these are very computationally expensive. We present a convection-permitting model generative emulator (CPMGEM), to skilfully emulate precipitation simulations by a 2.2km-resolution regional CPM at much lower cost. This utili… ▽ More

    Submitted 7 April, 2025; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: 47 pages, 11 figures, 5 tables; re-ordered sections; further evaluation of future change in heavy precipitation

    ACM Class: J.2

  14. arXiv:2407.12220  [pdf, other

    cs.LG cs.CL cs.CY

    Questionable practices in machine learning

    Authors: Gavin Leech, Juan J. Vazquez, Niclas Kupper, Misha Yagudin, Laurence Aitchison

    Abstract: Evaluating modern ML models is hard. The strong incentive for researchers and companies to report a state-of-the-art result on some metric often leads to questionable research practices (QRPs): bad practices which fall short of outright research fraud. We describe 44 such practices which can undermine reported results, giving examples where possible. Our list emphasises the evaluation of large lan… ▽ More

    Submitted 30 October, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2406.15027  [pdf, other

    cs.LG

    Using Neural Networks for Data Cleaning in Weather Datasets

    Authors: Jack R. P. Hanslope, Laurence Aitchison

    Abstract: In climate science, we often want to compare across different datasets. Difficulties can arise in doing this due to inevitable mismatches that arise between observational and reanalysis data, or even between different reanalyses. This misalignment can raise problems for any work that seeks to make inferences about one dataset from another. We considered tropical cyclone location as an example task… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 6 pages, 2 figures, ICML 2024 Workshop on Machine Learning for Earth System Modeling

  16. arXiv:2405.14394  [pdf, other

    cs.CL cs.AI

    Instruction Tuning With Loss Over Instructions

    Authors: Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani

    Abstract: Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can… ▽ More

    Submitted 2 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024. Code is available at https://github.com/ZhengxiangShi/InstructionModelling

  17. arXiv:2405.13698  [pdf, ps, other

    cs.LG cs.AI

    How to set AdamW's weight decay as you scale model and dataset size

    Authors: Xi Wang, Laurence Aitchison

    Abstract: The scaling of the optimal AdamW weight decay hyperparameter with model and dataset size is critical as we seek to build larger models, but is poorly understood. We show that weights learned by AdamW can be understood as an exponential moving average (EMA) of recent updates. This gives critical insights for how to set the weight decay in AdamW, and how the weight decay should scale with model and… ▽ More

    Submitted 1 June, 2025; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Published in ICML 2025

  18. arXiv:2403.20275  [pdf, other

    cs.CV cs.RO

    Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

    Authors: Mauro Comi, Alessio Tonioni, Max Yang, Jonathan Tremblay, Valts Blukis, Yijiong Lin, Nathan F. Lepora, Laurence Aitchison

    Abstract: Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 17 pages

  19. arXiv:2402.18824  [pdf, other

    cs.LG

    Batch size invariant Adam

    Authors: Xi Wang, Laurence Aitchison

    Abstract: We propose a batch size invariant version of Adam, for use in large-scale, distributed settings, in which the mini-batch is divided into micro-batches which are distributed among worker nodes. For the v term, standard Adam first computes the average over micro-batch gradients, then squares, while in the batch size invariant Adam proposed here, we first square the micro-batch gradients, then averag… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  20. arXiv:2402.13210  [pdf, other

    cs.LG

    Bayesian Reward Models for LLM Alignment

    Authors: Adam X. Yang, Maxime Robeyns, Thomas Coste, Zhengyan Shi, Jun Wang, Haitham Bou-Ammar, Laurence Aitchison

    Abstract: To ensure that large language model (LLM) responses are helpful and non-toxic, a reward model trained on human preference data is usually used. LLM responses with high rewards are then selected through best-of-$n$ (BoN) sampling or the LLM is further optimized to produce responses with high rewards through reinforcement learning from human feedback (RLHF). However, these processes are susceptible… ▽ More

    Submitted 2 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  21. arXiv:2402.06525  [pdf, other

    stat.ML cs.LG

    Flexible infinite-width graph convolutional networks and the importance of representation learning

    Authors: Ben Anson, Edward Milsom, Laurence Aitchison

    Abstract: A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the NNGP kernel is fixed, and tunable only through a small number of hyperparameters, eliminating any possibility of representation learning. This contrasts with fi… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  22. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  23. arXiv:2311.12602  [pdf, other

    cs.CV cs.LG

    TouchSDF: A DeepSDF Approach for 3D Shape Reconstruction using Vision-Based Tactile Sensing

    Authors: Mauro Comi, Yijiong Lin, Alex Church, Alessio Tonioni, Laurence Aitchison, Nathan F. Lepora

    Abstract: Humans rely on their visual and tactile senses to develop a comprehensive 3D understanding of their physical environment. Recently, there has been a growing interest in exploring and manipulating objects using data-driven approaches that utilise high-resolution vision-based tactile sensors. However, 3D shape reconstruction using tactile sensing has lagged behind visual shape reconstruction because… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 10 pages, 8 figures

  24. How to derive skill from the Fractions Skill Score

    Authors: Bobby Antonio, Laurence Aitchison

    Abstract: The Fractions Skill Score (FSS) is a widely used metric for assessing forecast skill, with applications ranging from precipitation to volcanic ash forecasts. By evaluating the fraction of grid squares exceeding a threshold in a neighbourhood, the intuition is that it can avoid the pitfalls of pixel-wise comparisons and identify length scales at which a forecast has skill. The FSS is typically inte… ▽ More

    Submitted 8 January, 2025; v1 submitted 20 November, 2023; originally announced November 2023.

  25. arXiv:2310.17374  [pdf, other

    stat.CO math.ST

    Using Autodiff to Estimate Posterior Moments, Marginals and Samples

    Authors: Sam Bowyer, Thomas Heap, Laurence Aitchison

    Abstract: Importance sampling is a popular technique in Bayesian inference: by reweighting samples drawn from a proposal distribution we are able to obtain samples and moment estimates from a Bayesian posterior over latent variables. Recent work, however, indicates that importance sampling scales poorly -- in order to accurately approximate the true posterior, the required number of importance samples grows… ▽ More

    Submitted 18 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  26. arXiv:2310.00035  [pdf, other

    cs.LG cs.AI

    LoRA ensembles for large language model fine-tuning

    Authors: Xi Wang, Laurence Aitchison, Maja Rudolph

    Abstract: Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there… ▽ More

    Submitted 4 October, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Update the title in the PDF file

  27. arXiv:2309.09814  [pdf, ps, other

    stat.ML cs.LG

    Convolutional Deep Kernel Machines

    Authors: Edward Milsom, Ben Anson, Laurence Aitchison

    Abstract: Standard infinite-width limits of neural networks sacrifice the ability for intermediate layers to learn representations from data. Recent work (A theory of representation learning gives a deep generalisation of kernel methods, Yang et al. 2023) modified the Neural Network Gaussian Process (NNGP) limit of Bayesian neural networks so that representation learning is retained. Furthermore, they found… ▽ More

    Submitted 26 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 Camera Ready Version

  28. arXiv:2309.03194  [pdf, other

    q-bio.NC

    Signatures of Bayesian inference emerge from energy efficient synapses

    Authors: James Malkin, Cian O'Donnell, Conor Houghton, Laurence Aitchison

    Abstract: Biological synaptic transmission is unreliable, and this unreliability likely degrades neural circuit performance. While there are biophysical mechanisms that can increase reliability, for instance by increasing vesicle release probability, these mechanisms cost energy. We examined four such mechanisms along with the associated scaling of the energetic costs. We then embedded these energetic costs… ▽ More

    Submitted 1 July, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: 29 pages, 11 figures

  29. arXiv:2308.13111  [pdf, other

    cs.LG

    Bayesian Low-rank Adaptation for Large Language Models

    Authors: Adam X. Yang, Maxime Robeyns, Xi Wang, Laurence Aitchison

    Abstract: Low-rank adaptation (LoRA) has emerged as a new paradigm for cost-efficient fine-tuning of large language models (LLMs). However, fine-tuned LLMs often become overconfident especially when fine-tuned on small datasets. Bayesian methods, with their inherent ability to estimate uncertainty, serve as potent tools to mitigate overconfidence and enhance calibration. In this work, we introduce Laplace-L… ▽ More

    Submitted 5 February, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  30. arXiv:2305.14454  [pdf, other

    stat.ML cs.LG

    An Improved Variational Approximate Posterior for the Deep Wishart Process

    Authors: Sebastian Ober, Ben Anson, Edward Milsom, Laurence Aitchison

    Abstract: Deep kernel processes are a recently introduced class of deep Bayesian models that have the flexibility of neural networks, but work entirely with Gram matrices. They operate by alternately sampling a Gram matrix from a distribution over positive semi-definite matrices, and applying a deterministic transformation. When the distribution is chosen to be Wishart, the model is called a deep Wishart pr… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  31. arXiv:2305.11022  [pdf, other

    cs.LG cs.NE stat.ML

    Massively Parallel Reweighted Wake-Sleep

    Authors: Thomas Heap, Gavin Leech, Laurence Aitchison

    Abstract: Reweighted wake-sleep (RWS) is a machine learning method for performing Bayesian inference in a very general class of models. RWS draws $K$ samples from an underlying approximate posterior, then uses importance weighting to provide a better estimate of the true posterior. RWS then updates its approximate posterior towards the importance-weighted estimate of the true posterior. However, recent work… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  32. arXiv:2302.14182  [pdf, other

    cs.LG cs.AI

    Taylor TD-learning

    Authors: Michele Garibbo, Maxime Robeyns, Laurence Aitchison

    Abstract: Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn a critic. However, TD-learning updates can be high variance. Here, we introduce a model-based RL framework, Taylor TD, which reduces this variance in continuous state-action settings. Taylor TD uses a first-order Taylor series expansion of TD updates. This expansion allows Taylor TD to analytically integrate… ▽ More

    Submitted 18 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Published as a conference paper at NeurIPS 2023

  33. arXiv:2302.11533  [pdf, other

    cs.LG

    MONGOOSE: Path-wise Smooth Bayesian Optimisation via Meta-learning

    Authors: Adam X. Yang, Laurence Aitchison, Henry B. Moss

    Abstract: In Bayesian optimisation, we often seek to minimise the black-box objective functions that arise in real-world physical systems. A primary contributor to the cost of evaluating such black-box objective functions is often the effort required to prepare the system for measurement. We consider a common scenario where preparation costs grow as the distance between successive evaluations increases. In… ▽ More

    Submitted 2 July, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

  34. arXiv:2302.04081  [pdf, other

    stat.ML cs.LG

    Decision trees compensate for model misspecification

    Authors: Hugh Panton, Gavin Leech, Laurence Aitchison

    Abstract: The best-performing models in ML are not interpretable. If we can explain why they outperform, we may be able to replicate these mechanisms and obtain both interpretability and performance. One example are decision trees and their descendent gradient boosting machines (GBMs). These perform well in the presence of complex interactions, with tree depth governing the order of interactions. However, i… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  35. arXiv:2302.01193  [pdf, other

    cs.LG cs.RO

    Imitating careful experts to avoid catastrophic events

    Authors: Jack R. P. Hanslope, Laurence Aitchison

    Abstract: RL is increasingly being used to control robotic systems that interact closely with humans. This interaction raises the problem of safe RL: how to ensure that a RL-controlled robotic system never, for instance, injures a human. This problem is especially challenging in rich, realistic settings where it is not even possible to clearly write down a reward function which incorporates these outcomes.… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 9 pages, 8 figures, accepted to NeurIPS 2022 Workshop on Robot Learning: Trustworthy Robotics

  36. arXiv:2211.16116  [pdf, other

    physics.ao-ph cs.LG

    Machine learning emulation of a local-scale UK climate model

    Authors: Henry Addison, Elizabeth Kendon, Suman Ravuri, Laurence Aitchison, Peter AG Watson

    Abstract: Climate change is causing the intensification of rainfall extremes. Precipitation projections with high spatial resolution are important for society to prepare for these changes, e.g. to model flooding impacts. Physics-based simulations for creating such projections are very computationally expensive. This work demonstrates the effectiveness of diffusion models, a form of deep generative models, f… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: 8 pages, 5 figures, Tackling Climate Change with Machine Learning workshop at NeurIPS 2022

  37. arXiv:2209.07509  [pdf, other

    cs.LG

    Random initialisations performing above chance and how to find them

    Authors: Frederik Benzing, Simon Schug, Robert Meier, Johannes von Oswald, Yassir Akram, Nicolas Zucchet, Laurence Aitchison, Angelika Steger

    Abstract: Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al.\ recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after t… ▽ More

    Submitted 7 November, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: NeurIPS 2022, 14th Annual Workshop on Optimization for Machine Learning (OPT2022)

  38. arXiv:2208.10892  [pdf, other

    q-bio.NC cs.LG cs.NE

    What deep reinforcement learning tells us about human motor learning and vice-versa

    Authors: Michele Garibbo, Casimir Ludwig, Nathan Lepora, Laurence Aitchison

    Abstract: Machine learning and specifically reinforcement learning (RL) has been extremely successful in helping us to understand neural decision making processes. However, RL's role in understanding other neural processes especially motor learning is much less well explored. To explore this connection, we investigated how recent deep RL methods correspond to the dominant motor learning framework in neurosc… ▽ More

    Submitted 26 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

    Comments: 23 pages, 5 figures

  39. arXiv:2206.12361  [pdf, other

    cs.LG cs.AI

    Robustness to corruption in pre-trained Bayesian neural networks

    Authors: Xi Wang, Laurence Aitchison

    Abstract: We develop ShiftMatch, a new training-data-dependent likelihood for robustness to corruption in Bayesian neural networks (BNNs). ShiftMatch is inspired by the training-data-dependent "EmpCov" priors from Izmailov et al. (2021a), and efficiently matches test-time spatial correlations to those at training time. Critically, ShiftMatch is designed to leave the neural network's training time likelihood… ▽ More

    Submitted 23 February, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

    Comments: Published in the International Conference on Learning Representations (ICLR) 2023

  40. arXiv:2109.03615  [pdf, ps, other

    cs.RO cs.AI cs.CV cs.LG

    Tactile Image-to-Image Disentanglement of Contact Geometry from Motion-Induced Shear

    Authors: Anupam K. Gupta, Laurence Aitchison, Nathan F. Lepora

    Abstract: Robotic touch, particularly when using soft optical tactile sensors, suffers from distortion caused by motion-dependent shear. The manner in which the sensor contacts a stimulus is entangled with the tactile information about the geometry of the stimulus. In this work, we propose a supervised convolutional deep neural network model that learns to disentangle, in the latent space, the components of… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

    Comments: 15 pages, 6 figure, under review CORL 2021

  41. arXiv:2108.13097  [pdf, other

    stat.ML cs.LG

    A theory of representation learning gives a deep generalisation of kernel methods

    Authors: Adam X. Yang, Maxime Robeyns, Edward Milsom, Ben Anson, Nandi Schoots, Laurence Aitchison

    Abstract: The successes of modern deep machine learning methods are founded on their ability to transform inputs across multiple layers to build good high-level representations. It is therefore critical to understand this process of representation learning. However, standard theoretical approaches (formally NNGPs) involving infinite width limits eliminate representation learning. We therefore develop a new… ▽ More

    Submitted 25 May, 2023; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Published in ICML 2023

  42. arXiv:2107.10125  [pdf, other

    stat.ML cs.LG

    A variational approximate posterior for the deep Wishart process

    Authors: Sebastian W. Ober, Laurence Aitchison

    Abstract: Recent work introduced deep kernel processes as an entirely kernel-based alternative to NNs (Aitchison et al. 2020). Deep kernel processes flexibly learn good top-layer representations by alternately sampling the kernel from a distribution over positive semi-definite matrices and performing nonlinear transformations. A particular deep kernel process, the deep Wishart process (DWP), is of particula… ▽ More

    Submitted 3 December, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). 23 pages

  43. arXiv:2107.02495  [pdf, other

    stat.ML cs.LG

    InfoNCE is variational inference in a recognition parameterised model

    Authors: Laurence Aitchison, Stoil Ganev

    Abstract: Here, we show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model, the recognition parameterised model (RPM). When we learn the optimal prior, the RPM ELBO becomes equal to the mutual information (MI; up to a constant), establishing a connection to pre-existing self-supervised learning methods such as InfoNCE. However, practical InfoNCE methods do… ▽ More

    Submitted 10 August, 2023; v1 submitted 6 July, 2021; originally announced July 2021.

  44. arXiv:2106.05586  [pdf, other

    stat.ML cs.LG

    Data augmentation in Bayesian neural networks and the cold posterior effect

    Authors: Seth Nabarro, Stoil Ganev, Adrià Garriga-Alonso, Vincent Fortuin, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks that incorporate data augmentation implicitly use a ``randomly perturbed log-likelihood [which] does not have a clean interpretation as a valid likelihood function'' (Izmailov et al. 2021). Here, we provide several approaches to developing principled Bayesian neural networks incorporating data augmentation. We introduce a ``finite orbit'' setting which allows likelihoods t… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  45. BNNpriors: A library for Bayesian neural network inference with different prior distributions

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks have shown great promise in many applications where calibrated uncertainty estimates are crucial and can often also lead to a higher predictive performance. However, it remains challenging to choose a good prior distribution over their weights. While isotropic Gaussian priors are often chosen in practice due to their simplicity, they do not reflect our true prior beliefs w… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at Software Impacts

  46. arXiv:2103.00222   

    stat.ML cs.LG

    Variational Laplace for Bayesian neural networks

    Authors: Ali Unlu, Laurence Aitchison

    Abstract: We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is (in essence) the log-likelihood, plus weight-decay, plus a squared-gradient regularizer. Variational L… ▽ More

    Submitted 20 July, 2021; v1 submitted 27 February, 2021; originally announced March 2021.

    Comments: Accidental resubmission of new version of arXiv:2011.10443

  47. arXiv:2102.12959  [pdf, other

    stat.ML cs.LG

    Bayesian OOD detection with aleatoric uncertainty and outlier exposure

    Authors: Xi Wang, Laurence Aitchison

    Abstract: Typical Bayesian approaches to OOD detection use epistemic uncertainty. Surprisingly from the Bayesian perspective, there are a number of methods that successfully use aleatoric uncertainty to detect OOD points (e.g. Hendryks et al. 2018). In addition, it is difficult to use outlier exposure to improve a Bayesian OOD detection model, as it is not clear whether it is possible or desirable to increa… ▽ More

    Submitted 28 October, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

  48. arXiv:2102.06571  [pdf, other

    stat.ML cs.LG

    Bayesian Neural Network Priors Revisited

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Sebastian W. Ober, Florian Wenzel, Gunnar Rätsch, Richard E. Turner, Mark van der Wilk, Laurence Aitchison

    Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, it is unclear whether these priors accurately reflect our true beliefs about the weight distributions or give optimal performance. To find better priors, we study summary statistics of neural network weights in networks trained using stochastic gradient descent (SGD). We find that convolution… ▽ More

    Submitted 16 March, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted at ICLR 2022

  49. arXiv:2011.10443  [pdf, other

    stat.ML cs.LG

    Variational Laplace for Bayesian neural networks

    Authors: Ali Unlu, Laurence Aitchison

    Abstract: We develop variational Laplace for Bayesian neural networks (BNNs) which exploits a local approximation of the curvature of the likelihood to estimate the ELBO without the need for stochastic sampling of the neural-network weights. The Variational Laplace objective is simple to evaluate, as it is (in essence) the log-likelihood, plus weight-decay, plus a squared-gradient regularizer. Variational L… ▽ More

    Submitted 10 August, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

  50. arXiv:2010.01590  [pdf, other

    stat.ML cs.LG

    Deep kernel processes

    Authors: Laurence Aitchison, Adam X. Yang, Sebastian W. Ober

    Abstract: We define deep kernel processes in which positive definite Gram matrices are progressively transformed by nonlinear kernel functions and by sampling from (inverse) Wishart distributions. Remarkably, we find that deep Gaussian processes (DGPs), Bayesian neural networks (BNNs), infinite BNNs, and infinite BNNs with bottlenecks can all be written as deep kernel processes. For DGPs the equivalence ari… ▽ More

    Submitted 30 May, 2021; v1 submitted 4 October, 2020; originally announced October 2020.

    Comments: 21 pages