Skip to main content

Showing 1–3 of 3 results for author: Bowyer, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.08264  [pdf, other

    stat.ML cs.LG

    Massively Parallel Expectation Maximization For Approximate Posteriors

    Authors: Thomas Heap, Sam Bowyer, Laurence Aitchison

    Abstract: Bayesian inference for hierarchical models can be very challenging. MCMC methods have difficulty scaling to large models with many observations and latent variables. While variational inference (VI) and reweighted wake-sleep (RWS) can be more scalable, they are gradient-based methods and so often require many iterations to converge. Our key insight was that modern massively parallel importance wei… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  2. arXiv:2503.01747  [pdf, other

    cs.AI cs.LG stat.ML

    Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints

    Authors: Sam Bowyer, Laurence Aitchison, Desi R. Ivanova

    Abstract: Rigorous statistical evaluations of large language models (LLMs), including valid error bars and significance testing, are essential for meaningful and reliable performance assessment. Currently, when such statistical measures are reported, they typically rely on the Central Limit Theorem (CLT). In this position paper, we argue that while CLT-based methods for uncertainty quantification are approp… ▽ More

    Submitted 28 May, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: 42 pages, 39 figures. ICML 2025 Spotlight Position Paper

  3. arXiv:2310.17374  [pdf, other

    stat.CO math.ST

    Using Autodiff to Estimate Posterior Moments, Marginals and Samples

    Authors: Sam Bowyer, Thomas Heap, Laurence Aitchison

    Abstract: Importance sampling is a popular technique in Bayesian inference: by reweighting samples drawn from a proposal distribution we are able to obtain samples and moment estimates from a Bayesian posterior over latent variables. Recent work, however, indicates that importance sampling scales poorly -- in order to accurately approximate the true posterior, the required number of importance samples grows… ▽ More

    Submitted 18 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.