-
Advances in Approximate Bayesian Inference for Models in Epidemiology
Authors:
Xiahui Li,
Fergus Chadwick,
Ben Swallow
Abstract:
Bayesian inference methods are useful in infectious diseases modeling due to their capability to propagate uncertainty, manage sparse data, incorporate latent structures, and address high-dimensional parameter spaces. However, parameter inference through assimilation of observational data in these models remains challenging. While asymptotically exact Bayesian methods offer theoretical guarantees…
▽ More
Bayesian inference methods are useful in infectious diseases modeling due to their capability to propagate uncertainty, manage sparse data, incorporate latent structures, and address high-dimensional parameter spaces. However, parameter inference through assimilation of observational data in these models remains challenging. While asymptotically exact Bayesian methods offer theoretical guarantees for accurate inference, they can be computationally demanding and impractical for real-time outbreak analysis. This review synthesizes recent advances in approximate Bayesian inference methods that aim to balance inferential accuracy with scalability. We focus on four prominent families: Approximate Bayesian Computation, Bayesian Synthetic Likelihood, Integrated Nested Laplace Approximation, and Variational Inference. For each method, we evaluate its relevance to epidemiological applications, emphasizing innovations that improve both computational efficiency and inference accuracy. We also offer practical guidance on method selection across a range of modeling scenarios. Finally, we identify hybrid exact approximate inference as a promising frontier that combines methodological rigor with the scalability needed for the response to outbreaks. This review provides epidemiologists with a conceptual framework to navigate the trade-off between statistical accuracy and computational feasibility in contemporary disease modeling.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Gaussian process modelling of infectious diseases using the Greta software package and GPUs
Authors:
Eva Gunn,
Nikhil Sengupta,
Ben Swallow
Abstract:
Gaussian process are a widely-used statistical tool for conducting non-parametric inference in applied sciences, with many computational packages available to fit to data and predict future observations. We study the use of the Greta software for Bayesian inference to apply Gaussian process regression to spatio-temporal data of infectious disease outbreaks and predict future disease spread. Greta…
▽ More
Gaussian process are a widely-used statistical tool for conducting non-parametric inference in applied sciences, with many computational packages available to fit to data and predict future observations. We study the use of the Greta software for Bayesian inference to apply Gaussian process regression to spatio-temporal data of infectious disease outbreaks and predict future disease spread. Greta builds on Tensorflow, making it comparatively easy to take advantage of the significant gain in speed offered by GPUs. In these complex spatio-temporal models, we show a reduction of up to 70\% in computational time relative to fitting the same models on CPUs. We show how the choice of covariance kernel impacts the ability to infer spread and extrapolate to unobserved spatial and temporal units. The inference pipeline is applied to weekly incidence data on tuberculosis in the East and West Midlands regions of England over a period of two years.
△ Less
Submitted 28 March, 2025; v1 submitted 8 November, 2024;
originally announced November 2024.
-
A Novel Approximate Bayesian Inference Method for Compartmental Models in Epidemiology using Stan
Authors:
Xiahui Li,
Ben Swallow,
Fergus J. Chadwick
Abstract:
Mechanistic compartmental models are widely used in epidemiology to study the dynamics of infectious disease transmission. These models have significantly contributed to designing and evaluating effective control strategies during pandemics. However, the increasing complexity and the number of parameters needed to describe rapidly evolving transmission scenarios present significant challenges for…
▽ More
Mechanistic compartmental models are widely used in epidemiology to study the dynamics of infectious disease transmission. These models have significantly contributed to designing and evaluating effective control strategies during pandemics. However, the increasing complexity and the number of parameters needed to describe rapidly evolving transmission scenarios present significant challenges for parameter estimation due to intractable likelihoods. To overcome this issue, likelihood-free methods have proven effective for accurately and efficiently fitting these models to data. In this study, we focus on approximate Bayesian computation (ABC) and synthetic likelihood methods for parameter inference. We develop a method that employs ABC to select the most informative subset of summary statistics, which are then used to construct a synthetic likelihood for posterior sampling. Posterior sampling is performed using Hamiltonian Monte Carlo as implemented in the Stan software. The proposed algorithm is demonstrated through simulation studies, showing promising results for inference in a simulated epidemic scenario.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
A Bayesian multivariate extreme value mixture model
Authors:
Chenglei Hu,
Ben Swallow,
Daniela Castro-Camilo
Abstract:
Impact assessment of natural hazards requires the consideration of both extreme and non-extreme events. Extensive research has been conducted on the joint modeling of bulk and tail in univariate settings; however, the corresponding body of research in the context of multivariate analysis is comparatively scant. This study extends the univariate joint modeling of bulk and tail to the multivariate f…
▽ More
Impact assessment of natural hazards requires the consideration of both extreme and non-extreme events. Extensive research has been conducted on the joint modeling of bulk and tail in univariate settings; however, the corresponding body of research in the context of multivariate analysis is comparatively scant. This study extends the univariate joint modeling of bulk and tail to the multivariate framework. Specifically, it pertains to cases where multivariate observations exceed a high threshold in at least one component. We propose a multivariate mixture model that assumes a parametric model to capture the bulk of the distribution, which is in the max-domain of attraction (MDA) of a multivariate extreme value distribution (mGEVD). The tail is described by the multivariate generalized Pareto distribution, which is asymptotically justified to model multivariate threshold exceedances. We show that if all components exceed the threshold, our mixture model is in the MDA of an mGEVD. Bayesian inference based on multivariate random-walk Metropolis-Hastings and the automated factor slice sampler allows us to incorporate uncertainty from the threshold selection easily. Due to computational limitations, simulations and data applications are provided for dimension $d=2$, but a discussion is provided with views toward scalability based on pairwise likelihood.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Data Fusion in a Two-stage Spatio-Temporal Model using the INLA-SPDE Approach
Authors:
Stephen Jun Villejo,
Janine B Illian,
Ben Swallow
Abstract:
This paper proposes a two-stage estimation approach for a spatial misalignment scenario that is motivated by the epidemiological problem of linking pollutant exposures and health outcomes. We use the integrated nested Laplace approximation method to estimate the parameters of a two-stage spatio-temporal model; the first stage models the exposures while the second stage links the health outcomes to…
▽ More
This paper proposes a two-stage estimation approach for a spatial misalignment scenario that is motivated by the epidemiological problem of linking pollutant exposures and health outcomes. We use the integrated nested Laplace approximation method to estimate the parameters of a two-stage spatio-temporal model; the first stage models the exposures while the second stage links the health outcomes to exposures. The first stage is based on the Bayesian melding model, which assumes a common latent field for the geostatistical monitors data and a high-resolution data such as satellite data. The second stage fits a GLMM using the spatial averages of the estimated latent field, and additional spatial and temporal random effects. Uncertainty from the first stage is accounted for by simulating repeatedly from the posterior predictive distribution of the latent field. A simulation study was carried out to assess the impact of the sparsity of the data on the monitors, number of time points, and the specification of the priors in terms of the biases, RMSEs, and coverage probabilities of the parameters and the block-level exposure estimates. The results show that the parameters are generally estimated correctly but there is difficulty in estimating the latent field parameters. The method works very well in estimating block-level exposures and the effect of exposures on the health outcomes, which is the primary parameter of interest for spatial epidemiologists and health policy makers, even with the use of non-informative priors.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Bayesian inference for stochastic oscillatory systems using the phase-corrected Linear Noise Approximation
Authors:
Ben Swallow,
David A. Rand,
Giorgos Minas
Abstract:
Likelihood-based inference in stochastic non-linear dynamical systems, such as those found in chemical reaction networks and biological clock systems, is inherently complex and has largely been limited to small and unrealistically simple systems. Recent advances in analytically tractable approximations to the underlying conditional probability distributions enable long-term dynamics to be accurate…
▽ More
Likelihood-based inference in stochastic non-linear dynamical systems, such as those found in chemical reaction networks and biological clock systems, is inherently complex and has largely been limited to small and unrealistically simple systems. Recent advances in analytically tractable approximations to the underlying conditional probability distributions enable long-term dynamics to be accurately modelled, and make the large number of model evaluations required for exact Bayesian inference much more feasible. We propose a new methodology for inference in stochastic non-linear dynamical systems exhibiting oscillatory behaviour and show the parameters in these models can be realistically estimated from simulated data. Preliminary analyses based on the Fisher Information Matrix of the model can guide the implementation of Bayesian inference. We show that this parameter sensitivity analysis can predict which parameters are practically identifiable. Several Markov chain Monte Carlo algorithms are compared, with our results suggesting a parallel tempering algorithm consistently gives the best approach for these systems, which are shown to frequently exhibit multi-modal posterior distributions.
△ Less
Submitted 4 July, 2024; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Visualization for Epidemiological Modelling: Challenges, Solutions, Reflections & Recommendations
Authors:
Jason Dykes,
Alfie Abdul-Rahman,
Daniel Archambault,
Benjamin Bach,
Rita Borgo,
Min Chen,
Jessica Enright,
Hui Fang,
Elif E. Firat,
Euan Freeman,
Tuna Gonen,
Claire Harris,
Radu Jianu,
Nigel W. John,
Saiful Khan,
Andrew Lahiff,
Robert S. Laramee,
Louise Matthews,
Sibylle Mohr,
Phong H. Nguyen,
Alma A. M. Rahat,
Richard Reeve,
Panagiotis D. Ritsos,
Jonathan C. Roberts,
Aidan Slingsby
, et al. (8 additional authors not shown)
Abstract:
We report on an ongoing collaboration between epidemiological modellers and visualization researchers by documenting and reflecting upon knowledge constructs -- a series of ideas, approaches and methods taken from existing visualization research and practice -- deployed and developed to support modelling of the COVID-19 pandemic. Structured independent commentary on these efforts is synthesized th…
▽ More
We report on an ongoing collaboration between epidemiological modellers and visualization researchers by documenting and reflecting upon knowledge constructs -- a series of ideas, approaches and methods taken from existing visualization research and practice -- deployed and developed to support modelling of the COVID-19 pandemic. Structured independent commentary on these efforts is synthesized through iterative reflection to develop: evidence of the effectiveness and value of visualization in this context; open problems upon which the research communities may focus; guidance for future activity of this type; and recommendations to safeguard the achievements and promote, advance, secure and prepare for future collaborations of this kind. In describing and comparing a series of related projects that were undertaken in unprecedented conditions, our hope is that this unique report, and its rich interactive supplementary materials, will guide the scientific community in embracing visualization in its observation, analysis and modelling of data as well as in disseminating findings. Equally we hope to encourage the visualization community to engage with impactful science in addressing its emerging data challenges. If we are successful, this showcase of activity may stimulate mutually beneficial engagement between communities with complementary expertise to address problems of significance in epidemiology and beyond. https://ramp-vis.github.io/RAMPVIS-PhilTransA-Supplement/
△ Less
Submitted 20 June, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Tracking the national and regional COVID-19 epidemic status in the UK using directed Principal Component Analysis
Authors:
Ben Swallow,
Wen Xiang,
Jasmina Panovska-Griffiths
Abstract:
One of the difficulties in monitoring an ongoing pandemic is deciding on the metric that best describes its status when multiple intercorrelated measurements are available. Having a single measure, such as the effective reproduction number R, has been a simple and useful metric for tracking the epidemic and for imposing policy interventions to curb the increase when R >1. While R is easy to interp…
▽ More
One of the difficulties in monitoring an ongoing pandemic is deciding on the metric that best describes its status when multiple intercorrelated measurements are available. Having a single measure, such as the effective reproduction number R, has been a simple and useful metric for tracking the epidemic and for imposing policy interventions to curb the increase when R >1. While R is easy to interpret in a fully susceptible population, it is more difficult to interpret for a population with heterogeneous prior immunity, e.g., from vaccination and prior infection. We propose an additional metric for tracking the UK epidemic which can capture the different spatial scales. These are the principal scores (PCs) from a weighted Principal Component Analysis. In this paper, we have used the methodology across the four UK nations and across the first two epidemic waves (January 2020-March 2021) to show that first principal score across nations and epidemic waves is a representative indicator of the state of the pandemic and are correlated with the trend in R. Hospitalisations are shown to be consistently representative, however, the precise dominant indicator, i.e. the principal loading(s) of the analysis, can vary geographically and across epidemic waves.
△ Less
Submitted 10 March, 2022; v1 submitted 3 October, 2021;
originally announced October 2021.
-
Parallel tempering as a mechanism for facilitating inference in hierarchical hidden Markov models
Authors:
Giada Sacchi,
Ben Swallow
Abstract:
The study of animal behavioural states inferred through hidden Markov models and similar state switching models has seen a significant increase in popularity in recent years. The ability to account for varying levels of behavioural scale has become possible through hierarchical hidden Markov models, but additional levels lead to higher complexity and increased correlation between model components.…
▽ More
The study of animal behavioural states inferred through hidden Markov models and similar state switching models has seen a significant increase in popularity in recent years. The ability to account for varying levels of behavioural scale has become possible through hierarchical hidden Markov models, but additional levels lead to higher complexity and increased correlation between model components. Maximum likelihood approaches to inference using the EM algorithm and direct optimisation of likelihoods are more frequently used, with Bayesian approaches being less favoured due to computational demands. Given these demands, it is vital that efficient estimation algorithms are developed when Bayesian methods are preferred. We study the use of various approaches to improve convergence times and mixing in Markov chain Monte Carlo methods applied to hierarchical hidden Markov models, including parallel tempering as an inference facilitation mechanism. The method shows promise for analysing complex stochastic models with high levels of correlation between components, but our results show that it requires careful tuning in order to maximise that potential.
△ Less
Submitted 23 November, 2020; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Parametric uncertainty in complex environmental models: a cheap emulation approach for models with high-dimensional output
Authors:
B. Swallow,
M. Rigby,
J. C. Rougier,
A. J. Manning,
M. Lunt,
S. O'Doherty
Abstract:
In order to understand underlying processes governing environmental and physical processes, and predict future outcomes, a complex computer model is frequently required to simulate these dynamics. However there is inevitably uncertainty related to the exact parametric form or the values of such parameters to be used when developing these simulators, with \emph{ranges} of plausible values prevalent…
▽ More
In order to understand underlying processes governing environmental and physical processes, and predict future outcomes, a complex computer model is frequently required to simulate these dynamics. However there is inevitably uncertainty related to the exact parametric form or the values of such parameters to be used when developing these simulators, with \emph{ranges} of plausible values prevalent in the literature. Systematic errors introduced by failing to account for these uncertainties have the potential to have a large effect on resulting estimates in unknown quantities of interest. Due to the complexity of these types of models, it is often unfeasible to run large numbers of training runs that are usually required for full statistical emulators of the environmental processes. We therefore present a method for accounting for uncertainties in complex environmental simulators without the need for very large numbers of training runs and illustrate the method through an application to the Met Office's atmospheric transport model NAME. We conclude that there are two principle parameters that are linked with variability in NAME outputs, namely the free tropospheric turbulence parameter and particle release height. Our results suggest the former should be significantly larger than is currently implemented as a default in NAME, whilst changes in the latter most likely stem from inconsistencies between the model specified ground height at the observation locations and the true height at this location. Estimated discrepancies from independent data are consistent with the discrepancy between modelled and true ground height.
△ Less
Submitted 13 February, 2017;
originally announced February 2017.