Search | arXiv e-print repository

The Effects of Air Pollution on Health: A Longitudinal Study of Los Angeles County Accounting for Measurement Error

Abstract: This study develops a Bayesian hierarchical model to explore the effects of air pollution on respiratory and cardiovascular mortality in Los Angeles County. The model takes into account various pollutants such as PM2.5, PM10, CO, SO2, NO2 and O3, as well as a related meteorological factor: temperature. The objective is to identify the significant factors affecting selected health outcomes without… ▽ More This study develops a Bayesian hierarchical model to explore the effects of air pollution on respiratory and cardiovascular mortality in Los Angeles County. The model takes into account various pollutants such as PM2.5, PM10, CO, SO2, NO2 and O3, as well as a related meteorological factor: temperature. The objective is to identify the significant factors affecting selected health outcomes without including all variables in each model specification. This flexibility enables the model to capture key drivers of health risk without redundancy. To account for potential measurement error in pollution data due to imperfect monitoring or averaging, certain observed pollutant levels are treated as noise proxies for true exposure. By specifying priors for regression coefficients and measurement error parameters and estimating posterior distributions via Markov Chain Monte Carlo (MCMC) sampling, it leads to more precise and reliable estimates of the health risks associated with air pollution exposure in Los Angeles County by incorporating both the count nature of the health data and the uncertainties in pollution measurements. △ Less

Submitted 27 January, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

arXiv:2402.08877 [pdf, other]

Computational Considerations for the Linear Model of Coregionalization

Authors: Renaud Alie, David A. Stephens, Alexandra M. Schmidt

Abstract: In the last two decades, the linear model of coregionalization (LMC) has been widely used to model multivariate spatial processes. However, it can be a challenging task to conduct likelihood-based inference for such models because of the cubic cost associated with Gaussian likelihood evaluations. Starting from an analogy with matrix normal models, we propose a reformulation of the LMC likelihood t… ▽ More In the last two decades, the linear model of coregionalization (LMC) has been widely used to model multivariate spatial processes. However, it can be a challenging task to conduct likelihood-based inference for such models because of the cubic cost associated with Gaussian likelihood evaluations. Starting from an analogy with matrix normal models, we propose a reformulation of the LMC likelihood that highlights the linear, rather than cubic, computational complexity as a function of the dimension of the response vector. We describe how those simplifications can be exploited in Gaussian hierarchical models. In addition, we propose a new sparsity-inducing approach to the LMC that introduces structural zeros in the coregionalization matrix in an attempt to reduce the number of parameters in a principled and data-driven way. Our reformulation of the LMC likelihood ensures that our sparse approach comes at virtually no additional cost when included in a Markov chain Monte Carlo (MCMC) algorithm. It is shown, on synthetic data, to significantly improve predictive performance. We also apply our methodology to a dataset comprised of air pollutant measurements from the state of California. We investigate the strength of the correlation among the measurements by providing new insights from our sparse method. △ Less

Submitted 2 December, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2306.11908 [pdf, ps, other]

Generalized Random Forests using Fixed-Point Trees

Authors: David Fleischer, David A. Stephens, Archer Y. Yang

Abstract: We propose a computationally efficient alternative to generalized random forests (GRFs) for estimating heterogeneous effects in large dimensions. While GRFs rely on a gradient-based splitting criterion, which in large dimensions is computationally expensive and unstable, our method introduces a fixed-point approximation that eliminates the need for Jacobian estimation. This gradient-free approach… ▽ More We propose a computationally efficient alternative to generalized random forests (GRFs) for estimating heterogeneous effects in large dimensions. While GRFs rely on a gradient-based splitting criterion, which in large dimensions is computationally expensive and unstable, our method introduces a fixed-point approximation that eliminates the need for Jacobian estimation. This gradient-free approach preserves GRF's theoretical guarantees of consistency and asymptotic normality while significantly improving computational efficiency. We demonstrate that our method achieves a speedup of multiple times over standard GRFs without compromising statistical accuracy. Experiments on both simulated and real-world data validate our approach. Our findings suggest that the proposed method is a scalable alternative for localized effect estimation in machine learning and causal inference applications △ Less

Submitted 16 June, 2025; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: 44 pages, 17 figures

arXiv:2304.12548 [pdf, other]

The impact of directly observed therapy on the efficacy of Tuberculosis treatment: A Bayesian multilevel approach

Authors: Widemberg S. Nobre, Alexandra M. Schmidt, Erica E. M. Moodie, David A. Stephens

Abstract: We propose and discuss a Bayesian procedure to estimate the average treatment effect (ATE) for multilevel observations in the presence of confounding. We focus on situations where the confounders may be latent (e.g., spatial latent effects). This work is motivated by an interest in determining the causal impact of directly observed therapy (DOT) on the successful treatment of Tuberculosis (TB); th… ▽ More We propose and discuss a Bayesian procedure to estimate the average treatment effect (ATE) for multilevel observations in the presence of confounding. We focus on situations where the confounders may be latent (e.g., spatial latent effects). This work is motivated by an interest in determining the causal impact of directly observed therapy (DOT) on the successful treatment of Tuberculosis (TB); the available data correspond to individual-level information observed across different cities in a state in Brazil. We focus on propensity score regression and covariate adjustment to balance the treatment (DOT) allocation. We discuss the need to include latent local-level random effects in the propensity score model to reduce bias in the estimation of the ATE. A simulation study suggests that accounting for the multilevel nature of the data with latent structures in both the outcome and propensity score models has the potential to reduce bias in the estimation of causal effects. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.15281 [pdf, other]

Bayesian inference for optimal dynamic treatment regimes in practice

Authors: Daniel Rodriguez Duque, Erica E. M. Moodie, David A. Stephens

Abstract: In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, th… ▽ More In this work, we examine recently developed methods for Bayesian inference of optimal dynamic treatment regimes (DTRs). DTRs are a set of treatment decision rules aimed at tailoring patient care to patient-specific characteristics, thereby falling within the realm of precision medicine. In this field, researchers seek to tailor therapy with the intention of improving health outcomes; therefore, they are most interested in identifying optimal DTRs. Recent work has developed Bayesian methods for identifying optimal DTRs in a family indexed by $ψ$ via Bayesian dynamic marginal structural models (MSMs) (Rodriguez Duque et al., 2022a); we review the proposed estimation procedure and illustrate its use via the new BayesDTR R package. Although methods in (Rodriguez Duque et al., 2022a) can estimate optimal DTRs well, they may lead to biased estimators when the model for the expected outcome if everyone in a population were to follow a given treatment strategy, known as a value function, is misspecified or when a grid search for the optimum is employed. We describe recent work that uses a Gaussian process ($GP$) prior on the value function as a means to robustly identify optimal DTRs (Rodriguez Duque et al., 2022b). We demonstrate how a $GP$ approach may be implemented with the BayesDTR package and contrast it with other value-search approaches to identifying optimal DTRs. We use data from an HIV therapeutic trial in order to illustrate a standard analysis with these methods, using both the original observed trial data and an additional simulated component to showcase a longitudinal (two-stage DTR) analysis. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.08735 [pdf, other]

A Bayesian Non-Stationary Heteroskedastic Time Series Model for Multivariate Critical Care Data

Authors: Zayd Omar, David A. Stephens, Alexandra M. Schmidt, David L. Buckeridge

Abstract: We propose a multivariate GARCH model for non-stationary health time series by modifying the variance of the observations of the standard state space model. The proposed model provides an intuitive way of dealing with heteroskedastic data using the conditional nature of state space models. We follow the Bayesian paradigm to perform the inference procedure. In particular, we use Markov chain Monte… ▽ More We propose a multivariate GARCH model for non-stationary health time series by modifying the variance of the observations of the standard state space model. The proposed model provides an intuitive way of dealing with heteroskedastic data using the conditional nature of state space models. We follow the Bayesian paradigm to perform the inference procedure. In particular, we use Markov chain Monte Carlo methods to obtain samples from the resultant posterior distribution. Due to the natural temporal correlation structure induced on model parameters, we use the forward filtering backward sampling algorithm to efficiently obtain samples from the posterior distribution. The proposed model also handles missing data in a fully Bayesian fashion. We validate our model on synthetic data, and then use it to analyze a data set obtained from an intensive care unit in a Montreal hospital. We further show that our proposed models offer better performance, in terms of WAIC, than standard state space models. The proposed model provides a new way to model multivariate heteroskedastic non-stationary time series data and the simplicity in applying the WAIC allows us to compare competing models. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2301.03710 [pdf, other]

A time-dependent Poisson-Gamma model for recruitment forecasting in multicenter studies

Authors: Armando Turchetta, Nicolas Savy, David A. Stephens, Erica E. M. Moodie, Marina B. Klein

Abstract: Forecasting recruitments is a key component of the monitoring phase of multicenter studies. One of the most popular techniques in this field is the Poisson-Gamma recruitment model, a Bayesian technique built on a doubly stochastic Poisson process. This approach is based on the modeling of enrollments as a Poisson process where the recruitment rates are assumed to be constant over time and to follo… ▽ More Forecasting recruitments is a key component of the monitoring phase of multicenter studies. One of the most popular techniques in this field is the Poisson-Gamma recruitment model, a Bayesian technique built on a doubly stochastic Poisson process. This approach is based on the modeling of enrollments as a Poisson process where the recruitment rates are assumed to be constant over time and to follow a common Gamma prior distribution. However, the constant-rate assumption is a restrictive limitation that is rarely appropriate for applications in real studies. In this paper, we illustrate a flexible generalization of this methodology which allows the enrollment rates to vary over time by modeling them through B-splines. We show the suitability of this approach for a wide range of recruitment behaviors in a simulation study and by estimating the recruitment progression of the Canadian Co-infection Cohort (CCC). △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2204.09862 [pdf, ps, other]

Targeting functional parameters with semiparametric Bayesian inference

Authors: Vivian Y. Meng, David A. Stephens

Abstract: Typical Bayesian inference requires parameter identification via likelihood parameterization, which has invited criticism for being less flexible than the Frequentist framework and subject to misspecification. Though misspecification may be avoided by functional parameter inference under a nonparametric model space, there does not exist a flexible Bayesian semiparametric model that would allow ful… ▽ More Typical Bayesian inference requires parameter identification via likelihood parameterization, which has invited criticism for being less flexible than the Frequentist framework and subject to misspecification. Though misspecification may be avoided by functional parameter inference under a nonparametric model space, there does not exist a flexible Bayesian semiparametric model that would allow full control over the marginal prior over any general functional parameter. We present the technique of $θ$-augmentation which helps us manipulate nonparametric models into semiparametric ones that directly target any functional parameter. The method allows Bayesian probabilistic statements to be drawn for any estimator that is defined as a functional of the empirical distribution without requiring a likelihood function, thus providing a path to Bayesian analysis in problems like causal inference and censoring where there do not exist well-accepted likelihood functions. △ Less

Submitted 25 November, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

arXiv:2204.02231 [pdf, ps, other]

Causal inference: critical developments, past and future

Authors: Erica EM Moodie, David A Stephens

Abstract: Causality is a subject of philosophical debate and a central scientific issue with a long history. In the statistical domain, the study of cause and effect based on the notion of `fairness' in comparisons dates back several hundred years, and yet statistical concepts and developments that form the area of causal inference are only decades old. In this paper, we review core tenets and methods of ca… ▽ More Causality is a subject of philosophical debate and a central scientific issue with a long history. In the statistical domain, the study of cause and effect based on the notion of `fairness' in comparisons dates back several hundred years, and yet statistical concepts and developments that form the area of causal inference are only decades old. In this paper, we review core tenets and methods of causal inference and key developments in the history of the field. We highlight connections with traditional `associational' statistical methods, including estimating equations and semiparametric theory, and point to current topics of active research in this crucial area of our field. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2203.06743 [pdf, other]

Bayesian Analysis of Sigmoidal Gaussian Cox Processes via Data Augmentation

Authors: Renaud Alie, David A. Stephens, Alexandra M. Schmidt

Abstract: Many models for point process data are defined through a thinning procedure where locations of a base process (often Poisson) are either kept (observed) or discarded (thinned). In this paper, we go back to the fundamentals of the distribution theory for point processes to establish a link between the base thinning mechanism and the joint density of thinned and observed locations in any of such mod… ▽ More Many models for point process data are defined through a thinning procedure where locations of a base process (often Poisson) are either kept (observed) or discarded (thinned). In this paper, we go back to the fundamentals of the distribution theory for point processes to establish a link between the base thinning mechanism and the joint density of thinned and observed locations in any of such models. In practice, the marginal model of observed points is often intractable, but thinned locations can be instantiated from their conditional distribution and typical data augmentation schemes can be employed to circumvent this problem. Such approaches have been employed in the recent literature, but some inconsistencies have been introduced across the different publications. We concentrate on an example: the so-called sigmoidal Gaussian Cox process. We apply our approach to resolve contradicting viewpoints in the data augmentation step of the inference procedures therein. We also provide a multitype extension to this process and conduct Bayesian inference on data consisting of positions of two different species of trees in Lansing Woods, Michigan. The emphasis is put on intertype dependence modeling with Bayesian uncertainty quantification. △ Less

Submitted 10 December, 2024; v1 submitted 13 March, 2022; originally announced March 2022.

arXiv:2201.12831 [pdf, ps, other]

Causal inference under mis-specification: adjustment based on the propensity score

Authors: David A. Stephens, Widemberg S. Nobre, Erica E. M. Moodie, Alexandra M. Schmidt

Abstract: We study Bayesian approaches to causal inference via propensity score regression. Much of the Bayesian literature on propensity score methods have relied on approaches that cannot be viewed as fully Bayesian in the context of conventional `likelihood times prior' posterior inference; in addition, most methods rely on parametric and distributional assumptions, and presumed correct specification. We… ▽ More We study Bayesian approaches to causal inference via propensity score regression. Much of the Bayesian literature on propensity score methods have relied on approaches that cannot be viewed as fully Bayesian in the context of conventional `likelihood times prior' posterior inference; in addition, most methods rely on parametric and distributional assumptions, and presumed correct specification. We emphasize that causal inference is typically carried out in settings of mis-specification, and develop strategies for fully Bayesian inference that reflect this. We focus on methods based on decision-theoretic arguments, and show how inference based on loss-minimization can give valid and fully Bayesian inference. We propose a computational approach to inference based on the Bayesian bootstrap which has good Bayesian and frequentist properties. △ Less

Submitted 30 January, 2022; originally announced January 2022.

arXiv:2108.01041 [pdf, ps, other]

Bayesian Sample Size Calculations for SMART Studies

Authors: Armando Turchetta, Erica E. M. Moodie, David A. Stephens, Sylvie D. Lambert

Abstract: In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have been growing in popularity as they offer a more individualized approach, and sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While… ▽ More In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have been growing in popularity as they offer a more individualized approach, and sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While the number of SMARTs has increased in recent years, their design has remained limited to the frequentist setting, which may not fully or appropriately account for uncertainty in design parameters and hence not yield appropriate sample size recommendations. Specifically, standard frequentist formulae rely on several assumptions that can be easily misspecified. The Bayesian framework offers a straightforward path to alleviate some of these concerns. In this paper, we provide calculations in a Bayesian setting to allow more realistic and robust estimates that account for uncertainty in inputs through the `two priors' approach. Additionally, compared to the standard formulae, this methodology allows us to rely on fewer assumptions, integrate pre-trial knowledge, and switch the focus from the standardized effect size to the minimal detectable difference. The proposed methodology is evaluated in a thorough simulation study and is implemented to estimate the sample size for a full-scale SMART of an Internet-Based Adaptive Stress Management intervention based on a pilot SMART conducted on cardiovascular disease patients from two Canadian provinces. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: Main article 16 pages, 3 figures, 2 tables. Appendix 11 pages, 10 tables. Submitted to Biometrics

arXiv:2106.10660 [pdf, other]

doi 10.1007/s11222-021-10032-8

Bayesian inference for continuous-time hidden Markov models with an unknown number of states

Authors: Yu Luo, David A. Stephens

Abstract: We consider the modeling of data generated by a latent continuous-time Markov jump process with a state space of finite but unknown dimensions. Typically in such models, the number of states has to be pre-specified, and Bayesian inference for a fixed number of states has not been studied until recently. In addition, although approaches to address the problem for discrete-time models have been deve… ▽ More We consider the modeling of data generated by a latent continuous-time Markov jump process with a state space of finite but unknown dimensions. Typically in such models, the number of states has to be pre-specified, and Bayesian inference for a fixed number of states has not been studied until recently. In addition, although approaches to address the problem for discrete-time models have been developed, no method has been successfully implemented for the continuous-time case. We focus on reversible jump Markov chain Monte Carlo which allows the trans-dimensional move among different numbers of states in order to perform Bayesian inference for the unknown number of states. Specifically, we propose an efficient split-combine move which can facilitate the exploration of the parameter space, and demonstrate that it can be implemented effectively at scale. Subsequently, we extend this algorithm to the context of model-based clustering, allowing numbers of states and clusters both determined during the analysis. The model formulation, inference methodology, and associated algorithm are illustrated by simulation studies. Finally, We apply this method to real data from a Canadian healthcare system in Quebec. △ Less

Submitted 20 June, 2021; originally announced June 2021.

MSC Class: 62F15; 62M05; 60J27

Journal ref: Statistics and Computing (2021), 31

arXiv:2105.12259 [pdf, other]

Estimation of Optimal Dynamic Treatment Regimes using Gaussian Process Emulation

Authors: Daniel Rodriguez Duque, David A. Stephens, Erica E. M. Moodie

Abstract: In precision medicine, identifying optimal sequences of decision rules, termed dynamic treatment regimes (DTRs), is an important undertaking. One approach investigators may take to infer about optimal DTRs is via Bayesian dynamic Marginal Structural Models (MSMs). These models represent the expected outcome under adherence to a DTR for DTRs in a family indexed by a parameter $ ψ$; the function map… ▽ More In precision medicine, identifying optimal sequences of decision rules, termed dynamic treatment regimes (DTRs), is an important undertaking. One approach investigators may take to infer about optimal DTRs is via Bayesian dynamic Marginal Structural Models (MSMs). These models represent the expected outcome under adherence to a DTR for DTRs in a family indexed by a parameter $ ψ$; the function mapping regimes in the family to the expected outcome under adherence to a DTR is known as the value function. Models that allow for the straightforward identification of an optimal DTR may lead to biased estimates. If such a model is computationally tractable, common wisdom says that a grid-search for the optimal DTR may obviate this difficulty. In a Bayesian context, computational difficulties may be compounded if a posterior mean must be calculated at each grid point. We seek to alleviate these inferential challenges by implementing Gaussian Process ($ \mathcal{GP} $) optimization methods for estimators for the causal effect of adherence to a specified DTR. We examine how to identify optimal DTRs in settings where the value function is multi-modal, which are often not addressed in the DTR literature. We conclude that a $ \mathcal{GP} $ modeling approach that acknowledges noise in the estimated response surface leads to improved results. Additionally, we find that a grid-search may not always yield a robust solution and that it is often less efficient than a $ \mathcal{GP} $ approach. We illustrate the use of the proposed methods by analyzing a clinical dataset with the aim of quantifying the effect of different patterns of HIV therapy. △ Less

Submitted 7 June, 2022; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2103.12293 [pdf, other]

Stochastic Reweighted Gradient Descent

Authors: Ayoub El Hanchi, David A. Stephens

Abstract: Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sam… ▽ More Despite the strong theoretical guarantees that variance-reduced finite-sum optimization algorithms enjoy, their applicability remains limited to cases where the memory overhead they introduce (SAG/SAGA), or the periodic full gradient computation they require (SVRG/SARAH) are manageable. A promising approach to achieving variance reduction while avoiding these drawbacks is the use of importance sampling instead of control variates. While many such methods have been proposed in the literature, directly proving that they improve the convergence of the resulting optimization algorithm has remained elusive. In this work, we propose an importance-sampling-based algorithm we call SRG (stochastic reweighted gradient). We analyze the convergence of SRG in the strongly-convex case and show that, while it does not recover the linear rate of control variates methods, it provably outperforms SGD. We pay particular attention to the time and memory overhead of our proposed method, and design a specialized red-black tree allowing its efficient implementation. Finally, we present empirical results to support our findings. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.12243 [pdf, other]

Adaptive Importance Sampling for Finite-Sum Optimization and Sampling with Decreasing Step-Sizes

Authors: Ayoub El Hanchi, David A. Stephens

Abstract: Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret… ▽ More Reducing the variance of the gradient estimator is known to improve the convergence rate of stochastic gradient-based optimization and sampling algorithms. One way of achieving variance reduction is to design importance sampling strategies. Recently, the problem of designing such schemes was formulated as an online learning problem with bandit feedback, and algorithms with sub-linear static regret were designed. In this work, we build on this framework and propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes. Under standard technical conditions, we show that Avare achieves $\mathcal{O}(T^{2/3})$ and $\mathcal{O}(T^{5/6})$ dynamic regret for SGD and SGLD respectively when run with $\mathcal{O}(1/t)$ step sizes. We achieve this dynamic regret bound by leveraging our knowledge of the dynamics defined by the algorithm, and combining ideas from online learning and variance-reduced stochastic optimization. We validate empirically the performance of our algorithm and identify settings in which it leads to significant improvements. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: Advances in Neural Information Processing Systems, Dec 2020, Vancouver, Canada

arXiv:2103.04086 [pdf, other]

Assessing the validity of Bayesian inference using loss functions

Authors: Yu Luo, David A. Stephens, Daniel J. Graham, Emma J. McCoy

Abstract: In the usual Bayesian setting, a full probabilistic model is required to link the data and parameters, and the form of this model and the inference and prediction mechanisms are specified via de Finetti's representation. In general, such a formulation is not robust to model mis-specification of its component parts. An alternative approach is to draw inference based on loss functions, where the qua… ▽ More In the usual Bayesian setting, a full probabilistic model is required to link the data and parameters, and the form of this model and the inference and prediction mechanisms are specified via de Finetti's representation. In general, such a formulation is not robust to model mis-specification of its component parts. An alternative approach is to draw inference based on loss functions, where the quantity of interest is defined as a minimizer of some expected loss, and to construct posterior distributions based on the loss-based formulation; this strategy underpins the construction of the Gibbs posterior. We develop a Bayesian non-parametric approach; specifically, we generalize the Bayesian bootstrap, and specify a Dirichlet process model for the distribution of the observables. We implement this using direct prior-to-posterior calculations, but also using predictive sampling. We also study the assessment of posterior validity for non-standard Bayesian calculations, and provide an efficient way to calibrate the scaling parameter in the Gibbs posterior so that it can achieve the desired coverage rate. We show that the developed non-standard Bayesian updating procedures yield valid posterior distributions in terms of consistency and asymptotic normality under model mis-specification. Simulation studies show that the proposed methods can recover the true value of the parameter efficiently and achieve frequentist coverage even when the sample size is small. Finally, we apply our methods to evaluate the causal impact of speed cameras on traffic collisions in England. △ Less

Submitted 9 February, 2023; v1 submitted 6 March, 2021; originally announced March 2021.

arXiv:2006.01799 [pdf, ps, other]

doi 10.1214/22-STS879

The role of exchangeability in causal inference

Authors: Olli Saarela, David A. Stephens, Erica E. M. Moodie

Abstract: Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and defini… ▽ More Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and definition of causal contrasts of interest, to the concept of exchangeability. Here we propose a probabilistic between-group exchangeability property as an identifying condition for causal effects, relate it to alternative conditions for unconfounded inferences (commonly stated using potential outcomes) and define causal contrasts in the presence of exchangeability in terms of posterior predictive expectations for further exchangeable units. While our main focus is on a point treatment setting, we also investigate how this reasoning carries over to longitudinal settings. △ Less

Submitted 15 December, 2022; v1 submitted 2 June, 2020; originally announced June 2020.

Journal ref: Statistical Science. 2023 Aug; 38(3): 369-385

arXiv:1906.10252 [pdf, other]

doi 10.1002/cjs.11671

Bayesian Clustering for Continuous-Time Hidden Markov Models

Authors: Yu Luo, David A. Stephens, David L. Buckeridge

Abstract: We develop clustering procedures for longitudinal trajectories based on a continuous-time hidden Markov model (CTHMM) and a generalized linear observation model. Specifically in this paper, we carry out finite and infinite mixture model-based clustering for a CTHMM and achieve inference using Markov chain Monte Carlo (MCMC). For a finite mixture model with prior on the number of components, we imp… ▽ More We develop clustering procedures for longitudinal trajectories based on a continuous-time hidden Markov model (CTHMM) and a generalized linear observation model. Specifically in this paper, we carry out finite and infinite mixture model-based clustering for a CTHMM and achieve inference using Markov chain Monte Carlo (MCMC). For a finite mixture model with prior on the number of components, we implement reversible-jump MCMC to facilitate the trans-dimensional move between different number of clusters. For a Dirichlet process mixture model, we utilize restricted Gibbs sampling split-merge proposals to expedite the MCMC algorithm. We employ proposed algorithms to the simulated data as well as a real data example, and the results demonstrate the desired performance of the new sampler. △ Less

Submitted 26 March, 2021; v1 submitted 24 June, 2019; originally announced June 2019.

MSC Class: 62F15; 91C20

Journal ref: Canadian Journal of Statistics (2021)

arXiv:1904.09394 [pdf, other]

Estimating Sparse Networks with Hubs

Authors: Annaliza McGillivray, Abbas Khalili, David A. Stephens

Abstract: Graphical modelling techniques based on sparse selection have been applied to infer complex networks in many fields, including biology and medicine, engineering, finance, and social sciences. One structural feature of some of the networks in such applications that poses a challenge for statistical inference is the presence of a small number of strongly interconnected nodes in a network which are c… ▽ More Graphical modelling techniques based on sparse selection have been applied to infer complex networks in many fields, including biology and medicine, engineering, finance, and social sciences. One structural feature of some of the networks in such applications that poses a challenge for statistical inference is the presence of a small number of strongly interconnected nodes in a network which are called hubs. For example, in microbiome research hubs or microbial taxa play a significant role in maintaining stability of the microbial community structure. In this paper, we investigate the problem of estimating sparse networks in which there are a few highly connected hub nodes. Methods based on L1-regularization have been widely used for performing sparse selection in the graphical modelling context. However, while these methods encourage sparsity, they do not take into account structural information of the network. We introduce a new method for estimating networks with hubs that exploits the ability of (inverse) covariance selection methods to include structural information about the underlying network. Our proposed method is a weighted lasso approach with novel row/column sum weights, which we refer to as the hubs weighted graphical lasso. We establish large sample properties of the method when the number of parameters diverges with the sample size, and evaluate its finite sample performance via extensive simulations. We illustrate the method with an application to microbiome data. △ Less

Submitted 1 March, 2020; v1 submitted 19 April, 2019; originally announced April 2019.

MSC Class: 62H12; 62F12; 62J07

arXiv:1708.09443 [pdf, other]

Transmission clusters in the HIV-1 epidemic among men who have sex with men in Montreal, Quebec, Canada

Authors: Luc Villandré, Aurélie Labbe, Ruxandra-Ilinca Ibanescu, Bluma Brenner, Michel Roger, David A Stephens

Abstract: Background. Several studies have used phylogenetics to investigate Human Immunodeficiency Virus (HIV) transmission among Men who have Sex with Men (MSMs) in Montreal, Quebec, Canada, revealing many transmission clusters. The Quebec HIV genotyping program sequence database now includes viral sequences from close to 4,000 HIV-positive individuals classified as MSMs. In this paper, we investigate clu… ▽ More Background. Several studies have used phylogenetics to investigate Human Immunodeficiency Virus (HIV) transmission among Men who have Sex with Men (MSMs) in Montreal, Quebec, Canada, revealing many transmission clusters. The Quebec HIV genotyping program sequence database now includes viral sequences from close to 4,000 HIV-positive individuals classified as MSMs. In this paper, we investigate clustering in those data by comparing results from several methods: the conventional Bayesian and maximum likelihood-bootstrap methods, and two more recent algorithms, DM-PhyClus, a Bayesian algorithm that produces a measure of uncertainty for proposed partitions, and the Gap Procedure, a fast distance-based approach. We estimate cluster growth by focusing on recent cases in the Primary HIV Infection (PHI) stage. Results. The analyses reveal considerable overlap between cluster estimates obtained from conventional methods. The Gap Procedure and DM-PhyClus rely on different cluster definitions and as a result, suggest moderately different partitions. All estimates lead to similar conclusions about cluster expansion: several large clusters have experienced sizeable growth, and a few new transmission clusters are likely emerging. Conclusions. The lack of a gold standard measure for clustering quality makes picking a best estimate among those proposed difficult. Work aiming to refine clustering criteria would be required to improve estimates. Nevertheless, the results unanimously stress the role that clusters play in promoting HIV incidence among MSMs. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1708.02648 [pdf, ps, other]

DM-PhyClus: A Bayesian phylogenetic algorithm for infectious disease transmission cluster inference

Authors: Luc Villandré, Aurélie Labbe, Bluma Brenner, Michel Roger, David A. Stephens

Abstract: Background. Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM… ▽ More Background. Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus, that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement. Results. Simulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis. Conclusions. DM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies. △ Less

Submitted 8 August, 2017; originally announced August 2017.

arXiv:1707.08354 [pdf, other]

A hierarchical Bayesian model for predicting ecological interactions using scaled evolutionary relationships

Authors: Mohamad Elmasri, Maxwell J. Farrell, T. Jonathan Davies, David A. Stephens

Abstract: Identifying undocumented or potential future interactions among species is a challenge facing modern ecologists. Recent link prediction methods rely on trait data, however large species interaction databases are typically sparse and covariates are limited to only a fraction of species. On the other hand, evolutionary relationships, encoded as phylogenetic trees, can act as proxies for underlying t… ▽ More Identifying undocumented or potential future interactions among species is a challenge facing modern ecologists. Recent link prediction methods rely on trait data, however large species interaction databases are typically sparse and covariates are limited to only a fraction of species. On the other hand, evolutionary relationships, encoded as phylogenetic trees, can act as proxies for underlying traits and historical patterns of parasite sharing among hosts. We show that using a network-based conditional model, phylogenetic information provides strong predictive power in a recently published global database of host-parasite interactions. By scaling the phylogeny using an evolutionary model, our method allows for biological interpretation often missing from latent variable models. To further improve on the phylogeny-only model, we combine a hierarchical Bayesian latent score framework for bipartite graphs that accounts for the number of interactions per species with the host dependence informed by phylogeny. Combining the two information sources yields significant improvement in predictive accuracy over each of the submodels alone. As many interaction networks are constructed from presence-only data, we extend the model by integrating a correction mechanism for missing interactions, which proves valuable in reducing uncertainty in unobserved interactions. △ Less

Submitted 19 September, 2019; v1 submitted 26 July, 2017; originally announced July 2017.

Comments: To appear in the Annals of Applied Statistics

arXiv:1704.08229 [pdf, ps, other]

Generalized G-estimation and Model Selection

Authors: M. P. Wallace, E. E. M. Moodie, D. A. Stephens

Abstract: Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extend… ▽ More Dynamic treatment regimes (DTRs) aim to formalize personalized medicine by tailoring treatment decisions to individual patient characteristics. G-estimation for DTR identification targets the parameters of a structural nested mean model known as the blip function from which the optimal DTR is derived. Despite considerable work deriving such estimation methods, there has been little focus on extending G-estimation to the case of non-additive effects, non-continuous outcomes or on model selection. We demonstrate how G-estimation can be more widely applied through the use of iteratively-reweighted least squares procedures, and illustrate this for log-linear models. We then derive a quasi-likelihood function for G-estimation within the DTR framework, and show how it can be used to form an information criterion for blip model selection. These developments are demonstrated through application to a variety of simulation studies as well as data from the Sequenced Treatment Alternatives to Relieve Depression study. △ Less

Submitted 26 April, 2017; originally announced April 2017.

arXiv:1701.04093 [pdf, other]

doi 10.1093/biomet/asw025

A Bayesian view of doubly robust causal inference

Authors: Olli Saarela, Léo R. Belzile, David A. Stephens

Abstract: In causal inference confounding may be controlled either through regression adjustment in an outcome model, or through propensity score adjustment or inverse probability of treatment weighting, or both. The latter approaches, which are based on modelling of the treatment assignment mechanism and their doubly robust extensions have been difficult to motivate using formal Bayesian arguments, in prin… ▽ More In causal inference confounding may be controlled either through regression adjustment in an outcome model, or through propensity score adjustment or inverse probability of treatment weighting, or both. The latter approaches, which are based on modelling of the treatment assignment mechanism and their doubly robust extensions have been difficult to motivate using formal Bayesian arguments, in principle, for likelihood-based inferences, the treatment assignment model can play no part in inferences concerning the expected outcomes if the models are assumed to be correctly specified. On the other hand, forcing dependency between the outcome and treatment assignment models by allowing the former to be misspecified results in loss of the balancing property of the propensity scores and the loss of any double robustness. In this paper, we explain in the framework of misspecified models why doubly robust inferences cannot arise from purely likelihood-based arguments, and demonstrate this through simulations. As an alternative to Bayesian propensity score analysis, we propose a Bayesian posterior predictive approach for constructing doubly robust estimation procedures. Our approach appropriately decouples the outcome and treatment assignment models by incorporating the inverse treatment assignment probabilities in Bayesian causal inferences as importance sampling weights in Monte Carlo integration. △ Less

Submitted 15 January, 2017; originally announced January 2017.

Comments: Author's original version. 21 pages, including supplementary material

MSC Class: 62F15

Journal ref: Biometrika (2016), 103 (3): 667-681

arXiv:0910.5060 [pdf, ps, other]

doi 10.1214/14-BA914

Two-sample Bayesian Nonparametric Hypothesis Testing

Authors: Chris C. Holmes, François Caron, Jim E. Griffin, David A. Stephens

Abstract: In this article we describe Bayesian nonparametric procedures for two-sample hypothesis testing. Namely, given two sets of samples $\mathbf{y}^{\scriptscriptstyle(1)}\;$\stackrel{\scriptscriptstyle{iid}}{\s im}$\;F^{\scriptscriptstyle(1)}$ and $\mathbf{y}^{\scriptscriptstyle(2 )}\;$\stackrel{\scriptscriptstyle{iid}}{\sim}$\;F^{\scriptscriptstyle( 2)}$, with… ▽ More In this article we describe Bayesian nonparametric procedures for two-sample hypothesis testing. Namely, given two sets of samples $\mathbf{y}^{\scriptscriptstyle(1)}\;$\stackrel{\scriptscriptstyle{iid}}{\s im}$\;F^{\scriptscriptstyle(1)}$ and $\mathbf{y}^{\scriptscriptstyle(2 )}\;$\stackrel{\scriptscriptstyle{iid}}{\sim}$\;F^{\scriptscriptstyle( 2)}$, with $F^{\scriptscriptstyle(1)},F^{\scriptscriptstyle(2)}$ unknown, we wish to evaluate the evidence for the null hypothesis $H_0:F^{\scriptscriptstyle(1)}\equiv F^{\scriptscriptstyle(2)}$ versus the alternative $H_1:F^{\scriptscriptstyle(1)}\neq F^{\scriptscriptstyle(2)}$. Our method is based upon a nonparametric Pólya tree prior centered either subjectively or using an empirical procedure. We show that the Pólya tree prior leads to an analytic expression for the marginal likelihood under the two hypotheses and hence an explicit measure of the probability of the null $\mathrm{Pr}(H_0|\{\mathbf {y}^{\scriptscriptstyle(1)},\mathbf{y}^{\scriptscriptstyle(2)}\}\mathbf{)}$. △ Less

Submitted 11 May, 2015; v1 submitted 27 October, 2009; originally announced October 2009.

Comments: Published at http://dx.doi.org/10.1214/14-BA914 in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/)

Report number: VTeX-BA-BA914

Journal ref: Bayesian Analysis 2015, Vol. 10, No. 2, 297-320

arXiv:0711.0186 [pdf, ps, other]

Population-Based Reversible Jump Markov Chain Monte Carlo

Authors: Ajay Jasra, David A. Stephens, Chris C. Holmes

Abstract: In this paper we present an extension of population-based Markov chain Monte Carlo (MCMC) to the trans-dimensional case. One of the main challenges in MCMC-based inference is that of simulating from high and trans-dimensional target measures. In such cases, MCMC methods may not adequately traverse the support of the target; the simulation results will be unreliable. We develop population methods… ▽ More In this paper we present an extension of population-based Markov chain Monte Carlo (MCMC) to the trans-dimensional case. One of the main challenges in MCMC-based inference is that of simulating from high and trans-dimensional target measures. In such cases, MCMC methods may not adequately traverse the support of the target; the simulation results will be unreliable. We develop population methods to deal with such problems, and give a result proving the uniform ergodicity of these population algorithms, under mild assumptions. This result is used to demonstrate the superiority, in terms of convergence rate, of a population transition kernel over a reversible jump sampler for a Bayesian variable selection problem. We also give an example of a population algorithm for a Bayesian multivariate mixture model with an unknown number of components. This is applied to gene expression data of 1000 data points in six dimensions and it is demonstrated that our algorithm out performs some competing Markov chain samplers. △ Less

Submitted 1 November, 2007; originally announced November 2007.

arXiv:0709.0139 [pdf, ps, other]

Non-Regular Likelihood Inference for Seasonally Persistent Processes

Authors: Emma J. McCoy, Sofia C. Olhede, David A. Stephens

Abstract: The estimation of parameters in the frequency spectrum of a seasonally persistent stationary stochastic process is addressed. For seasonal persistence associated with a pole in the spectrum located away from frequency zero, a new Whittle-type likelihood is developed that explicitly acknowledges the location of the pole. This Whittle likelihood is a large sample approximation to the distribution… ▽ More The estimation of parameters in the frequency spectrum of a seasonally persistent stationary stochastic process is addressed. For seasonal persistence associated with a pole in the spectrum located away from frequency zero, a new Whittle-type likelihood is developed that explicitly acknowledges the location of the pole. This Whittle likelihood is a large sample approximation to the distribution of the periodogram over a chosen grid of frequencies, and constitutes an approximation to the time-domain likelihood of the data, via the linear transformation of an inverse discrete Fourier transform combined with a demodulation. The new likelihood is straightforward to compute, and as will be demonstrated has good, yet non-standard, properties. The asymptotic behaviour of the proposed likelihood estimators is studied; in particular, $N$-consistency of the estimator of the spectral pole location is established. Large finite sample and asymptotic distributions of the score and observed Fisher information are given, and the corresponding distributions of the maximum likelihood estimators are deduced. A study of the small sample properties of the likelihood approximation is provided, and its superior performance to previously suggested methods is shown, as well as agreement with the developed distributional approximations. △ Less

Submitted 2 September, 2007; originally announced September 2007.

Comments: 57 pages, including 5 figures

Showing 1–28 of 28 results for author: Stephens, D A