-
Testing Hypotheses of Covariate Effects on Topics of Discourse
Authors:
Gabriel Phelan,
David A. Campbell
Abstract:
We introduce an approach to topic modelling with document-level covariates that remains tractable in the face of large text corpora. This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model, assuming instead that the data come from a fixed but unknown distribution whose statistical functionals are of interest. We propose combining a convex formulatio…
▽ More
We introduce an approach to topic modelling with document-level covariates that remains tractable in the face of large text corpora. This is achieved by de-emphasizing the role of parameter estimation in an underlying probabilistic model, assuming instead that the data come from a fixed but unknown distribution whose statistical functionals are of interest. We propose combining a convex formulation of non-negative matrix factorization with standard regression techniques as a fast-to-compute and useful estimate of such a functional. Uncertainty quantification can then be achieved by reposing non-parametric resampling methods on top of this scheme. This is in contrast to popular topic modelling paradigms, which posit a complex and often hard-to-fit generative model of the data. We argue that the simple, non-parametric approach advocated here is faster, more interpretable, and enjoys better inferential justification than said generative models. Finally, our methods are demonstrated with an application analysing covariate effects on discourse of flavours attributed to Canadian beers.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Parallel Tempering via Simulated Tempering Without Normalizing Constants
Authors:
Biljana Jonoska Stojkova,
David A. Campbell
Abstract:
In this paper we develop a new general Bayesian methodology that simultaneously estimates parameters of interest and the marginal likelihood of the model. The proposed methodology builds on Simulated Tempering, which is a powerful algorithm that enables sampling from multi-modal distributions. However, Simulated Tempering comes with the practical limitation of needing to specify a prior for the te…
▽ More
In this paper we develop a new general Bayesian methodology that simultaneously estimates parameters of interest and the marginal likelihood of the model. The proposed methodology builds on Simulated Tempering, which is a powerful algorithm that enables sampling from multi-modal distributions. However, Simulated Tempering comes with the practical limitation of needing to specify a prior for the temperature along a chosen discretization schedule that will allow calculation of normalizing constants at each temperature. Our proposed model defines the prior for the temperature so as to remove the need for calculating normalizing constants at each temperature and thereby enables a continuous temperature schedule, while preserving the sampling efficiency of the Simulated Tempering algorithm. The resulting algorithm simultaneously estimates parameters while estimating marginal likelihoods through thermodynamic integration. We illustrate the applicability of the new algorithm to different examples involving mixture models of Gaussian distributions and ordinary differential equation models.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Incremental Mixture Importance Sampling with Shotgun optimization
Authors:
Biljana Jonoska Stojkova,
David A. Campbell
Abstract:
This paper proposes a general optimization strategy, which combines results from different optimization or parameter estimation methods to overcome shortcomings of a single method. Shotgun optimization is developed as a framework which employs different optimization strategies, criteria, or conditional targets to enable wider likelihood exploration. The introduced Shotgun optimization approach is…
▽ More
This paper proposes a general optimization strategy, which combines results from different optimization or parameter estimation methods to overcome shortcomings of a single method. Shotgun optimization is developed as a framework which employs different optimization strategies, criteria, or conditional targets to enable wider likelihood exploration. The introduced Shotgun optimization approach is embedded into an incremental mixture importance sampling algorithm to produce improved posterior samples for multimodal densities and creates robustness in cases where the likelihood and prior are in disagreement. Despite using different optimization approaches, the samples are combined into samples from a single target posterior. The diversity of the framework is demonstrated on parameter estimation from differential equation models employing diverse strategies including numerical solutions and approximations thereof. Additionally the approach is demonstrated on mixtures of discrete and continuous parameters and is shown to ease estimation from synthetic likelihood models. R code of the implemented examples is stored in a zipped archive (codeSubmit.zip).
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
Sequentially Constrained Monte Carlo
Authors:
Shirin Golchi,
David A. Campbell
Abstract:
Constraints can be interpreted in a broad sense as any kind of explicit restriction over the parameters. While some constraints are defined directly on the parameter space, when they are instead defined by known behaviour on the model, transformation of constraints into features on the parameter space may not be possible. Difficulties in sampling from the posterior distribution as a result of inco…
▽ More
Constraints can be interpreted in a broad sense as any kind of explicit restriction over the parameters. While some constraints are defined directly on the parameter space, when they are instead defined by known behaviour on the model, transformation of constraints into features on the parameter space may not be possible. Difficulties in sampling from the posterior distribution as a result of incorporation of constraints into the model is a common challenge leading to truncations in the parameter space and inefficient sampling algorithms. We propose a variant of sequential Monte Carlo algorithm for posterior sampling in presence of constraints by defining a sequence of densities through the imposition of the constraint. Particles generated from an unconstrained or mildly constrained distribution are filtered and moved through sampling and resampling steps to obtain a sample from the fully constrained target distribution. General and model specific forms of constraints enforcing strategies are defined. The Sequentially Constrained Monte Carlo algorithm is demonstrated on constraints defined by monotonicity of a function, densities constrained to low dimensional manifolds, adherence to a theoretically derived model, and model feature matching.
△ Less
Submitted 25 February, 2015; v1 submitted 29 October, 2014;
originally announced October 2014.
-
Transdimensional Approximate Bayesian Computation for Inference on Invasive Species Models with Latent Variables of Unknown Dimension
Authors:
Oksana A. Chkrebtii,
Erin K. Cameron,
David A. Campbell,
Erin M. Bayne
Abstract:
Accurate information on patterns of introduction and spread of non-native species is essential for making predictions and management decisions. In many cases, estimating unknown rates of introduction and spread from observed data requires evaluating intractable variable-dimensional integrals. In general, inference on the large class of models containing latent variables of large or variable dimens…
▽ More
Accurate information on patterns of introduction and spread of non-native species is essential for making predictions and management decisions. In many cases, estimating unknown rates of introduction and spread from observed data requires evaluating intractable variable-dimensional integrals. In general, inference on the large class of models containing latent variables of large or variable dimension precludes exact sampling techniques. Approximate Bayesian computation (ABC) methods provide an alternative to exact sampling but rely on inefficient conditional simulation of the latent variables. To accomplish this task efficiently, a new transdimensional Monte Carlo sampler is developed for approximate Bayesian model inference and used to estimate rates of introduction and spread for the non-native earthworm species Dendrobaena octaedra (Savigny) along roads in the boreal forest of northern Alberta. Using low and high estimates of introduction and spread rates, the extent of earthworm invasions in northeastern Alberta was simulated to project the proportion of suitable habitat invaded in the year following data collection.
△ Less
Submitted 30 December, 2014; v1 submitted 10 October, 2013;
originally announced October 2013.
-
Monotone Function Estimation for Computer Experiments
Authors:
Shirin Golchi,
Derek R. Bingham,
Hugh Chipman,
David A. Campbell
Abstract:
In statistical modeling of computer experiments sometimes prior information is available about the underlying function. For example, the physical system simulated by the computer code may be known to be monotone with respect to some or all inputs. We develop a Bayesian approach to Gaussian process modelling capable of incorporating monotonicity information for computer model emulation. Markov chai…
▽ More
In statistical modeling of computer experiments sometimes prior information is available about the underlying function. For example, the physical system simulated by the computer code may be known to be monotone with respect to some or all inputs. We develop a Bayesian approach to Gaussian process modelling capable of incorporating monotonicity information for computer model emulation. Markov chain Monte Carlo methods are used to sample from the posterior distribution of the process given the simulator output and monotonicity information. The performance of the proposed approach in terms of predictive accuracy and uncertainty quantification is demonstrated in a number of simulated examples as well as a real queueing system application.
△ Less
Submitted 14 June, 2014; v1 submitted 15 September, 2013;
originally announced September 2013.
-
Bayesian Solution Uncertainty Quantification for Differential Equations
Authors:
Oksana A. Chkrebtii,
David A. Campbell,
Ben Calderhead,
Mark A. Girolami
Abstract:
We explore probability modelling of discretization uncertainty for system states defined implicitly by ordinary or partial differential equations. Accounting for this uncertainty can avoid posterior under-coverage when likelihoods are constructed from a coarsely discretized approximation to system equations. A formalism is proposed for inferring a fixed but a priori unknown model trajectory throug…
▽ More
We explore probability modelling of discretization uncertainty for system states defined implicitly by ordinary or partial differential equations. Accounting for this uncertainty can avoid posterior under-coverage when likelihoods are constructed from a coarsely discretized approximation to system equations. A formalism is proposed for inferring a fixed but a priori unknown model trajectory through Bayesian updating of a prior process conditional on model information. A one-step-ahead sampling scheme for interrogating the model is described, its consistency and first order convergence properties are proved, and its computational complexity is shown to be proportional to that of numerical explicit one-step solvers. Examples illustrate the flexibility of this framework to deal with a wide variety of complex and large-scale systems. Within the calibration problem, discretization uncertainty defines a layer in the Bayesian hierarchy, and a Markov chain Monte Carlo algorithm that targets this posterior distribution is presented. This formalism is used for inference on the JAK-STAT delay differential equation model of protein dynamics from indirectly observed measurements. The discussion outlines implications for the new field of probabilistic numerics.
△ Less
Submitted 23 October, 2016; v1 submitted 10 June, 2013;
originally announced June 2013.