-
Bayesian inference on the order of stationary vector autoregressions
Authors:
Rachel L. Binks,
Sarah E. Heaps,
Mariella Panagiotopoulou,
Yujiang Wang,
Darren J. Wilkinson
Abstract:
Vector autoregressions (VARs) are a widely used tool for modelling multivariate time-series. It is common to assume a VAR is stationary; this can be enforced by imposing the stationarity condition which restricts the parameter space of the autoregressive coefficients to the stationary region. However, implementing this constraint is difficult due to the complex geometry of the stationary region. F…
▽ More
Vector autoregressions (VARs) are a widely used tool for modelling multivariate time-series. It is common to assume a VAR is stationary; this can be enforced by imposing the stationarity condition which restricts the parameter space of the autoregressive coefficients to the stationary region. However, implementing this constraint is difficult due to the complex geometry of the stationary region. Fortunately, recent work has provided a solution for autoregressions of fixed order $p$ based on a reparameterization in terms of a set of interpretable and unconstrained transformed partial autocorrelation matrices. In this work, focus is placed on the difficult problem of allowing $p$ to be unknown, developing a prior and computational inference that takes full account of order uncertainty. Specifically, the multiplicative gamma process is used to build a prior which encourages increasing shrinkage of the partial autocorrelations with increasing lag. Identifying the lag beyond which the partial autocorrelations become equal to zero then determines $p$. Based on classic time-series theory, a principled choice of truncation criterion identifies whether a partial autocorrelation matrix is effectively zero. Posterior inference utilizes Hamiltonian Monte Carlo via Stan. The work is illustrated in a substantive application to neural activity data to investigate ultradian brain rhythms.
△ Less
Submitted 3 December, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
A sparse Bayesian hierarchical vector autoregressive model for microbial dynamics in a wastewater treatment plant
Authors:
Naomi E. Hannaford,
Sarah E. Heaps,
Tom M. W. Nye,
Thomas P. Curtis,
Ben Allen,
Andrew Golightly,
Darren J. Wilkinson
Abstract:
Proper function of a wastewater treatment plant (WWTP) relies on maintaining a delicate balance between a multitude of competing microorganisms. Gaining a detailed understanding of the complex network of interactions therein is essential to maximising not only current operational efficiencies, but also for the effective design of new treatment technologies. Metagenomics offers an insight into thes…
▽ More
Proper function of a wastewater treatment plant (WWTP) relies on maintaining a delicate balance between a multitude of competing microorganisms. Gaining a detailed understanding of the complex network of interactions therein is essential to maximising not only current operational efficiencies, but also for the effective design of new treatment technologies. Metagenomics offers an insight into these dynamic systems through the analysis of the microbial DNA sequences present. Unique taxa are inferred through sequence clustering to form operational taxonomic units (OTUs), with per-taxa abundance estimates obtained from corresponding sequence counts. The data in this study comprise weekly OTU counts from an activated sludge (AS) tank of a WWTP. To model the OTU dynamics, we develop a Bayesian hierarchical vector autoregressive model, which is a linear approximation to the commonly used generalised Lotka-Volterra (gLV) model. To tackle the high dimensionality and sparsity of the data, they are first clustered into 12 "bins" using a seasonal phase-based approach. The autoregressive coefficient matrix is assumed to be sparse, so we explore different shrinkage priors by analysing simulated data sets before selecting the regularised horseshoe prior for the biological application. We find that ammonia and chemical oxygen demand have a positive relationship with several bins and pH has a positive relationship with one bin. These results are supported by findings in the biological literature. We identify several negative interactions, which suggests OTUs in different bins may be competing for resources and that these relationships are complex. We also identify two positive interactions. Although simpler than a gLV model, our vector autoregression offers valuable insight into the microbial dynamics of the WWTP.
△ Less
Submitted 1 July, 2021;
originally announced July 2021.
-
Bayesian spatio-temporal model for high-resolution short-term forecasting of precipitation fields
Authors:
Stephen Richard Johnson,
Sarah Elizabeth Heaps,
Kevin James Wilson,
Darren James Wilkinson
Abstract:
With extreme weather events becoming more common, the risk posed by surface water flooding is ever increasing. In this work we propose a model, and associated Bayesian inference scheme, for generating probabilistic (high-resolution short-term) forecasts of localised precipitation. The parametrisation of our underlying hierarchical dynamic spatio-temporal model is motivated by a forward-time, centr…
▽ More
With extreme weather events becoming more common, the risk posed by surface water flooding is ever increasing. In this work we propose a model, and associated Bayesian inference scheme, for generating probabilistic (high-resolution short-term) forecasts of localised precipitation. The parametrisation of our underlying hierarchical dynamic spatio-temporal model is motivated by a forward-time, centred-space finite difference solution to a collection of stochastic partial differential equations, where the main driving forces are advection and diffusion. Observations from both weather radar and ground based rain gauges provide information from which we can learn about the likely values of the (latent) precipitation field in addition to other unknown model parameters. Working in the Bayesian paradigm provides a coherent framework for capturing uncertainty both in the underlying model parameters and also in our forecasts. Further, appealing to simulation based (MCMC) sampling yields a straightforward solution to handling zeros, treated as censored observations, via data augmentation. Both the underlying state and the observations are of moderately large dimension ($\mathcal{O}(10^4)$ and $\mathcal{O}(10^3)$ respectively) and this renders standard inference approaches computationally infeasible. Our solution is to embed the ensemble Kalman smoother within a Gibbs sampling scheme to facilitate approximate Bayesian inference in reasonable time. Both the methodology and the effectiveness of our posterior sampling scheme are demonstrated via simulation studies and also by a case study of real data from the Urban Observatory project based in Newcastle upon Tyne, UK.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
A Review of Stochastic Block Models and Extensions for Graph Clustering
Authors:
Clement Lee,
Darren J Wilkinson
Abstract:
There have been rapid developments in model-based clustering of graphs, also known as block modelling, over the last ten years or so. We review different approaches and extensions proposed for different aspects in this area, such as the type of the graph, the clustering approach, the inference approach, and whether the number of groups is selected or estimated. We also review models that combine b…
▽ More
There have been rapid developments in model-based clustering of graphs, also known as block modelling, over the last ten years or so. We review different approaches and extensions proposed for different aspects in this area, such as the type of the graph, the clustering approach, the inference approach, and whether the number of groups is selected or estimated. We also review models that combine block modelling with topic modelling and/or longitudinal modelling, regarding how these models deal with multiple types of data. How different approaches cope with various issues will be summarised and compared, to facilitate the demand of practitioners for a concise overview of the current status of these areas of literature.
△ Less
Submitted 30 October, 2019; v1 submitted 28 February, 2019;
originally announced March 2019.
-
A Social Network Analysis of Articles on Social Network Analysis
Authors:
Clement Lee,
Darren J Wilkinson
Abstract:
A collection of articles on the statistical modelling and inference of social networks is analysed in a network fashion. The references of these articles are used to construct a citation network data set, which is almost a directed acyclic graph because only existing articles can be cited. A mixed membership stochastic block model is then applied to this data set to soft cluster the articles. The…
▽ More
A collection of articles on the statistical modelling and inference of social networks is analysed in a network fashion. The references of these articles are used to construct a citation network data set, which is almost a directed acyclic graph because only existing articles can be cited. A mixed membership stochastic block model is then applied to this data set to soft cluster the articles. The results obtained from a Gibbs sampler give us insights into the influence and the categorisation of these articles.
△ Less
Submitted 29 October, 2018; v1 submitted 23 October, 2018;
originally announced October 2018.
-
A hierarchical model of non-homogeneous Poisson processes for Twitter retweets
Authors:
Clement Lee,
Darren J Wilkinson
Abstract:
We present a hierarchical model of non-homogeneous Poisson processes (NHPP) for information diffusion on online social media, in particular Twitter retweets. The retweets of each original tweet are modelled by a NHPP, for which the intensity function is a product of time-decaying components and another component that depends on the follower count of the original tweet author. The latter allows us…
▽ More
We present a hierarchical model of non-homogeneous Poisson processes (NHPP) for information diffusion on online social media, in particular Twitter retweets. The retweets of each original tweet are modelled by a NHPP, for which the intensity function is a product of time-decaying components and another component that depends on the follower count of the original tweet author. The latter allows us to explain or predict the ultimate retweet count by a network centrality-related covariate. The inference algorithm enables the Bayes factor to be computed, in order to facilitate model selection. Finally, the model is applied to the retweet data sets of two hashtags.
△ Less
Submitted 17 February, 2019; v1 submitted 6 February, 2018;
originally announced February 2018.
-
A Network Epidemic Model for Online Community Commissioning Data
Authors:
Clement Lee,
Andrew Garbett,
Darren J. Wilkinson
Abstract:
A statistical model assuming a preferential attachment network, which is generated by adding nodes sequentially according to a few simple rules, usually describes real-life networks better than a model assuming, for example, a Bernoulli random graph, in which any two nodes have the same probability of being connected, does. Therefore, to study the propogation of "infection" across a social network…
▽ More
A statistical model assuming a preferential attachment network, which is generated by adding nodes sequentially according to a few simple rules, usually describes real-life networks better than a model assuming, for example, a Bernoulli random graph, in which any two nodes have the same probability of being connected, does. Therefore, to study the propogation of "infection" across a social network, we propose a network epidemic model by combining a stochastic epidemic model and a preferential attachment model. A simulation study based on the subsequent Markov Chain Monte Carlo algorithm reveals an identifiability issue with the model parameters. Finally, the network epidemic model is applied to a set of online commissioning data.
△ Less
Submitted 20 July, 2017; v1 submitted 24 February, 2017;
originally announced February 2017.
-
Online state and parameter estimation in Dynamic Generalised Linear Models
Authors:
Rui Vieira,
Darren J. Wilkinson
Abstract:
Inference for streaming time-series is tightly coupled with the problem of Bayesian on-line state and parameter inference. In this paper we will introduce Dynamic Generalised Linear Models, the class of models often chosen to model continuous and discrete time-series data. We will look at three different approaches which allow on-line estimation and analyse the results when applied to different re…
▽ More
Inference for streaming time-series is tightly coupled with the problem of Bayesian on-line state and parameter inference. In this paper we will introduce Dynamic Generalised Linear Models, the class of models often chosen to model continuous and discrete time-series data. We will look at three different approaches which allow on-line estimation and analyse the results when applied to different real world datasets related to inference for streaming data. Sufficient statistics based methods delay known problems, such as particle impoverishment, especially when applied to long running time-series, while providing reasonable parameter estimations when compared to exact methods, such as Particle Marginal Metropolis-Hastings. State and observation forecasts will also be analysed as a performance metric. By benchmarking against a "gold standard" (off-line) method, we can better understand the performance of on-line methods in challenging real-world scenarios.
△ Less
Submitted 30 August, 2016;
originally announced August 2016.
-
Bayesian hierarchical modelling for inferring genetic interactions in yeast
Authors:
Jonathan Heydari,
Conor Lawless,
David A. Lydall,
Darren J. Wilkinson
Abstract:
Quantitative Fitness Analysis (QFA) is a high-throughput experimental and computational methodology for measuring the growth of microbial populations. QFA screens can be used to compare the health of cell populations with and without a mutation in a query gene in order to infer genetic interaction strengths genome-wide, examining thousands of separate genotypes. We introduce Bayesian, hierarchical…
▽ More
Quantitative Fitness Analysis (QFA) is a high-throughput experimental and computational methodology for measuring the growth of microbial populations. QFA screens can be used to compare the health of cell populations with and without a mutation in a query gene in order to infer genetic interaction strengths genome-wide, examining thousands of separate genotypes. We introduce Bayesian, hierarchical models of population growth rates and genetic interactions that better reflect QFA experimental design than current approaches. Our new approach models population dynamics and genetic interaction simultaneously, thereby avoiding passing information between models via a univariate fitness summary. Matching experimental structure more closely, Bayesian hierarchical approaches use data more efficiently and find new evidence for genes which interact with yeast telomeres within a published dataset.
△ Less
Submitted 14 August, 2015;
originally announced August 2015.
-
Likelihood free inference for Markov processes: a comparison
Authors:
Jamie Owen,
Darren J. Wilkinson,
Colin S. Gillespie
Abstract:
Approaches to Bayesian inference for problems with intractable likelihoods have become increasingly important in recent years. Approximate Bayesian computation (ABC) and "likelihood free" Markov chain Monte Carlo techniques are popular methods for tackling inference in these scenarios but such techniques are computationally expensive. In this paper we compare the two approaches to inference, with…
▽ More
Approaches to Bayesian inference for problems with intractable likelihoods have become increasingly important in recent years. Approximate Bayesian computation (ABC) and "likelihood free" Markov chain Monte Carlo techniques are popular methods for tackling inference in these scenarios but such techniques are computationally expensive. In this paper we compare the two approaches to inference, with a particular focus on parameter inference for stochastic kinetic models, widely used in systems biology. Discrete time transition kernels for models of this type are intractable for all but the most trivial systems yet forward simulation is usually straightforward. We discuss the relative merits and drawbacks of each approach whilst considering the computational cost implications and efficiency of these techniques. In order to explore the properties of each approach we examine a range of observation regimes using two example models. We use a Lotka--Volterra predator prey model to explore the impact of full or partial species observations using various time course observations under the assumption of known and unknown measurement error. Further investigation into the impact of observation error is then made using a Schlögl system, a test case which exhibits bi-modal state stability in some regions of parameter space.
△ Less
Submitted 2 October, 2014;
originally announced October 2014.
-
Bayesian inference for Markov jump processes with informative observations
Authors:
Andrew Golightly,
Darren J. Wilkinson
Abstract:
In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the a…
▽ More
In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the ability to simulate trajectories from the conditioned jump process. When observations are highly informative, use of the forward simulator is likely to be inefficient and may even preclude an exact (simulation based) analysis. We therefore propose three methods for improving the efficiency of simulating conditioned jump processes. A conditioned hazard is derived based on an approximation to the jump process, and used to generate end-point conditioned trajectories for use inside an importance sampling algorithm. We also adapt a recently proposed sequential Monte Carlo scheme to our problem. Essentially, trajectories are reweighted at a set of intermediate time points, with more weight assigned to trajectories that are consistent with the next observation. We consider two implementations of this approach, based on two continuous approximations of the MJP. We compare these constructs for a simple tractable jump process before using them to perform inference for a Lotka-Volterra system. The best performing construct is used to infer the parameters governing a simple model of motility regulation in Bacillus subtilis.
△ Less
Submitted 15 September, 2014;
originally announced September 2014.
-
Bayesian identification of protein differential expression in multi-group isobaric labelled mass spectrometry data
Authors:
Howsun Jow,
Richard J. Boys,
Darren J. Wilkinson
Abstract:
In this paper we develop a Bayesian statistical inference approach to the unified analysis of isobaric labelled MS/MS proteomic data across multiple experiments. An explicit probabilistic model of the log-intensity of the isobaric labels' reporter ions across multiple pre-defined groups and experiments is developed. This is then used to develop a full Bayesian statistical methodology for the ident…
▽ More
In this paper we develop a Bayesian statistical inference approach to the unified analysis of isobaric labelled MS/MS proteomic data across multiple experiments. An explicit probabilistic model of the log-intensity of the isobaric labels' reporter ions across multiple pre-defined groups and experiments is developed. This is then used to develop a full Bayesian statistical methodology for the identification of differentially expressed proteins, with respect to a control group, across multiple groups and experiments. This methodology is implemented and then evaluated on simulated data and on two model experimental datasets (for which the differentially expressed proteins are known) that use a TMT labelling protocol.
△ Less
Submitted 24 July, 2014;
originally announced July 2014.
-
Scalable Inference for Markov Processes with Intractable Likelihoods
Authors:
Jamie Owen,
Darren J. Wilkinson,
Colin S. Gillespie
Abstract:
Bayesian inference for Markov processes has become increasingly relevant in recent years. Problems of this type often have intractable likelihoods and prior knowledge about model rate parameters is often poor. Markov Chain Monte Carlo (MCMC) techniques can lead to exact inference in such models but in practice can suffer performance issues including long burn-in periods and poor mixing. On the oth…
▽ More
Bayesian inference for Markov processes has become increasingly relevant in recent years. Problems of this type often have intractable likelihoods and prior knowledge about model rate parameters is often poor. Markov Chain Monte Carlo (MCMC) techniques can lead to exact inference in such models but in practice can suffer performance issues including long burn-in periods and poor mixing. On the other hand approximate Bayesian computation techniques can allow rapid exploration of a large parameter space but yield only approximate posterior distributions. Here we consider the combined use of approximate Bayesian computation (ABC) and MCMC techniques for improved computational efficiency while retaining exact inference on parallel hardware.
△ Less
Submitted 22 October, 2014; v1 submitted 26 March, 2014;
originally announced March 2014.
-
Fast Bayesian parameter estimation for stochastic logistic growth models
Authors:
Jonathan Heydari,
Conor Lawless,
David A. Lydall,
Darren J. Wilkinson
Abstract:
The transition density of a stochastic, logistic population growth model with multiplicative intrinsic noise is analytically intractable. Inferring model parameter values by fitting such stochastic differential equation (SDE) models to data therefore requires relatively slow numerical simulation. Where such simulation is prohibitively slow, an alternative is to use model approximations which do ha…
▽ More
The transition density of a stochastic, logistic population growth model with multiplicative intrinsic noise is analytically intractable. Inferring model parameter values by fitting such stochastic differential equation (SDE) models to data therefore requires relatively slow numerical simulation. Where such simulation is prohibitively slow, an alternative is to use model approximations which do have an analytically tractable transition density, enabling fast inference. We introduce two such approximations, with either multiplicative or additive intrinsic noise, each derived from the linear noise approximation of the logistic growth SDE. After Bayesian inference we find that our fast LNA models, using Kalman filter recursion for computation of marginal likelihoods, give similar posterior distributions to slow arbitrarily exact models. We also demonstrate that simulations from our LNA models better describe the characteristics of the stochastic logistic growth models than a related approach. Finally, we demonstrate that our LNA model with additive intrinsic noise and measurement error best describes an example set of longitudinal observations of microbial population size taken from a typical, genome-wide screening experiment.
△ Less
Submitted 26 October, 2013; v1 submitted 21 October, 2013;
originally announced October 2013.