Search | arXiv e-print repository

A Bayesian Spatio-Temporal Top-Down Framework for Estimating Opioid Use Disorder Risk Under Data Sparsity

Authors: Emily N Peterson, Alex Edwards, Martha Wetzel, Lance A Waller, Hannah Cooper, Courtney Yarbrough

Abstract: County-level estimates of opioid use disorder (OUD) are essential for understanding the influence of local economic and social conditions. They provide policymakers with the granular information needed to identify, target, and implement effective interventions and allocate resources appropriately. Traditional disease mapping methods typically rely on Poisson regression, modeling observed counts wh… ▽ More County-level estimates of opioid use disorder (OUD) are essential for understanding the influence of local economic and social conditions. They provide policymakers with the granular information needed to identify, target, and implement effective interventions and allocate resources appropriately. Traditional disease mapping methods typically rely on Poisson regression, modeling observed counts while adjusting for local covariates that are treated as fixed and known. However, these methods may fail to capture the complexities and uncertainties in areas with sparse or absent data. To address this challenge, we developed a Bayesian hierarchical spatio-temporal top-down approach designed to estimate county-level OUD rates when direct small-area (county) data is unavailable. This method allows us to infer small-area OUD rates and quantify associated uncertainties, even in data-sparse environments using observed state-level OUD rates and a combination of state and county level informative covariates. We applied our approach to estimate OUD rates for 3,143 counties in the United States between 2010 and 2025. Model performance was assessed through simulation studies. △ Less

Submitted 2 June, 2025; originally announced June 2025.

arXiv:2503.05067 [pdf, other]

Inverse sampling intensity weighting for preferential sampling adjustment

Authors: Thomas W. Hsiao, Lance A. Waller

Abstract: Traditional geostatistical methods assume independence between observation locations and the spatial process of interest. Violations of this independence assumption are referred to as preferential sampling (PS). Standard methods to address PS rely on estimating complex shared latent variable models and can be difficult to apply in practice. We study the use of inverse sampling intensity weighting… ▽ More Traditional geostatistical methods assume independence between observation locations and the spatial process of interest. Violations of this independence assumption are referred to as preferential sampling (PS). Standard methods to address PS rely on estimating complex shared latent variable models and can be difficult to apply in practice. We study the use of inverse sampling intensity weighting (ISIW) for PS adjustment in model-based geostatistics. ISIW is a two-stage approach wherein we estimate the sampling intensity of the observation locations then define intensity-based weights within a weighted likelihood adjustment. Prediction follows by substituting the adjusted parameter estimates within a kriging framework. A primary contribution was to implement ISIW by means of the Vecchia approximation, which provides large computational gains and improvements in predictive accuracy. Interestingly, we found that accurate parameter estimation had little correlation with predictive performance, raising questions about the conditions and parameter choices driving optimal implementation of kriging-based predictors under PS. Our work highlights the potential of ISIW to adjust for PS in an intuitive, fast, and effective manner. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2503.02137 [pdf, other]

What Influences the Field Goal Attempts of Professional Players? Analysis of Basketball Shot Charts via Log Gaussian Cox Processes with Spatially Varying Coefficients

Authors: Jiahao Cao, Qingpo Cai, Lance A. Waller, DeMarc A. Hickson, Guanyu Hu, Jian Kang

Abstract: Basketball shot charts provide valuable information regarding local patterns of in-game performance to coaches, players, sports analysts, and statisticians. The spatial patterns of where shots were attempted and whether the shots were successful suggest options for offensive and defensive strategies as well as historical summaries of performance against particular teams and players. The data repre… ▽ More Basketball shot charts provide valuable information regarding local patterns of in-game performance to coaches, players, sports analysts, and statisticians. The spatial patterns of where shots were attempted and whether the shots were successful suggest options for offensive and defensive strategies as well as historical summaries of performance against particular teams and players. The data represent a marked spatio-temporal point process with locations representing locations of attempted shots and an associated mark representing the shot's outcome (made/missed). Here, we develop a Bayesian log Gaussian Cox process model allowing joint analysis of the spatial pattern of locations and outcomes of shots across multiple games. We build a hierarchical model for the log intensity function using Gaussian processes, and allow spatially varying effects for various game-specific covariates. We aim to model the spatial relative risk under different covariate values. For inference via posterior simulation, we design a Markov chain Monte Carlo (MCMC) algorithm based on a kernel convolution approach. We illustrate the proposed method using extensive simulation studies. A case study analyzing the shot data of NBA legends Stephen Curry, LeBron James, and Michael Jordan highlights the effectiveness of our approach in real-world scenarios and provides practical insights into optimizing shooting strategies by examining how different playing conditions, game locations, and opposing team strengths impact shooting efficiency. △ Less

Submitted 3 March, 2025; originally announced March 2025.

arXiv:2409.18358 [pdf]

A Capture-Recapture Approach to Facilitate Causal Inference for a Trial-eligible Observational Cohort

Authors: Lin Ge, Yuzi Zhang, Lance A. Waller, Robert H. Lyles

Abstract: Background: We extend recently proposed design-based capture-recapture methods for prevalence estimation among registry participants, in order to support causal inference among a trial-eligible target population. The proposed design for CRC analysis integrates an observational study cohort with a randomized trial involving a small representative study sample, and enhances the generalizability and… ▽ More Background: We extend recently proposed design-based capture-recapture methods for prevalence estimation among registry participants, in order to support causal inference among a trial-eligible target population. The proposed design for CRC analysis integrates an observational study cohort with a randomized trial involving a small representative study sample, and enhances the generalizability and transportability of the findings. Methods: We develop a novel CRC-type estimator derived via multinomial distribution-based maximum-likelihood that exploits the design to deliver benefits in terms of validity and efficiency for comparing the effects of two treatments on a binary outcome. Additionally, the design enables a direct standardization-type estimator for efficient estimation of general means (e.g., of biomarker levels) under a specific treatment, and for their comparison across treatments. For inference, we propose a tailored Bayesian credible interval approach to improve coverage properties in conjunction with the proposed CRC estimator for binary outcomes, along with a bootstrap percentile interval approach for use in the case of continuous outcomes. Results: Simulations demonstrate the proposed estimators derived from the CRC design. The multinomial-based maximum-likelihood estimator shows benefits in terms of validity and efficiency in treatment effect comparisons, while the direct standardization-type estimator allows comprehensive comparison of treatment effects within the target population. Conclusion: The extended CRC methods provide a useful framework for causal inference in a trial-eligible target population by integrating observational and randomized trial data. The novel estimators enhance the generalizability and transportability of findings, offering efficient and valid tools for treatment effect comparisons on both binary and continuous outcomes. △ Less

Submitted 19 January, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

arXiv:2312.13331 [pdf, other]

A Bayesian Spatial Berkson error approach to estimate small area opioid mortality rates accounting for population-at-risk uncertainty

Authors: Emily N Peterson, Rachel C. Nethery, Jarvis T. Chen, Loni P. Tabb, Brent A. Coull, Frederic B. Piel, Lance A Waller

Abstract: Monitoring small-area geographical population trends in opioid mortality has large scale implications to informing preventative resource allocation. A common approach to obtain small area estimates of opioid mortality is to use a standard disease mapping approach in which population-at-risk estimates are treated as fixed and known. Assuming fixed populations ignores the uncertainty surrounding sma… ▽ More Monitoring small-area geographical population trends in opioid mortality has large scale implications to informing preventative resource allocation. A common approach to obtain small area estimates of opioid mortality is to use a standard disease mapping approach in which population-at-risk estimates are treated as fixed and known. Assuming fixed populations ignores the uncertainty surrounding small area population estimates, which may bias risk estimates and under-estimate their associated uncertainties. We present a Bayesian Spatial Berkson Error (BSBE) model to incorporate population-at-risk uncertainty within a disease mapping model. We compare the BSBE approach to the naive (treating denominators as fixed) using simulation studies to illustrate potential bias resulting from this assumption. We show the application of the BSBE model to obtain 2020 opioid mortality risk estimates for 159 counties in GA accounting for population-at-risk uncertainty. Utilizing our proposed approach will help to inform interventions in opioid related public health responses, policies, and resource allocation. Additionally, we provide a general framework to improve in the estimation and mapping of health indicators. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2307.00214 [pdf, ps, other]

Utilizing a Capture-Recapture Strategy to Accelerate Infectious Disease Surveillance

Authors: Lin Ge, Yuzi Zhang, Lance A. Waller, Robert H. Lyles

Abstract: Monitoring key elements of disease dynamics (e.g., prevalence, case counts) is of great importance in infectious disease prevention and control, as emphasized during the COVID-19 pandemic. To facilitate this effort, we propose a new capture-recapture (CRC) analysis strategy that takes misclassification into account from easily-administered, imperfect diagnostic test kits, such as the Rapid Antigen… ▽ More Monitoring key elements of disease dynamics (e.g., prevalence, case counts) is of great importance in infectious disease prevention and control, as emphasized during the COVID-19 pandemic. To facilitate this effort, we propose a new capture-recapture (CRC) analysis strategy that takes misclassification into account from easily-administered, imperfect diagnostic test kits, such as the Rapid Antigen Test-kits or saliva tests. Our method is based on a recently proposed "anchor stream" design, whereby an existing voluntary surveillance data stream is augmented by a smaller and judiciously drawn random sample. It incorporates manufacturer-specified sensitivity and specificity parameters to account for imperfect diagnostic results in one or both data streams. For inference to accompany case count estimation, we improve upon traditional Wald-type confidence intervals by developing an adapted Bayesian credible interval for the CRC estimator that yields favorable frequentist coverage properties. When feasible, the proposed design and analytic strategy provides a more efficient solution than traditional CRC methods or random sampling-based biased-corrected estimation to monitor disease prevalence while accounting for misclassification. We demonstrate the benefits of this approach through simulation studies that underscore its potential utility in practice for economical disease monitoring among a registered closed population. △ Less

Submitted 9 October, 2024; v1 submitted 30 June, 2023; originally announced July 2023.

arXiv:2306.10666 [pdf]

On some pitfalls of the log-linear modeling framework for capture-recapture studies in disease surveillance

Authors: Yuzi Zhang, Lin Ge, Lance A. Waller, Robert H. Lyles

Abstract: In epidemiological studies, the capture-recapture (CRC) method is a powerful tool that can be used to estimate the number of diseased cases or potentially disease prevalence based on data from overlapping surveillance systems. Estimators derived from log-linear models are widely applied by epidemiologists when analyzing CRC data. The popularity of the log-linear model framework is largely associat… ▽ More In epidemiological studies, the capture-recapture (CRC) method is a powerful tool that can be used to estimate the number of diseased cases or potentially disease prevalence based on data from overlapping surveillance systems. Estimators derived from log-linear models are widely applied by epidemiologists when analyzing CRC data. The popularity of the log-linear model framework is largely associated with its accessibility and the fact that interaction terms can allow for certain types of dependency among data streams. In this work, we shed new light on significant pitfalls associated with the log-linear model framework in the context of CRC using real data examples and simulation studies. First, we demonstrate that the log-linear model paradigm is highly exclusionary. That is, it can exclude, by design, many possible estimates that are potentially consistent with the observed data. Second, we clarify the ways in which regularly used model selection metrics (e.g., information criteria) are fundamentally deceiving in the effort to select a best model in this setting. By focusing attention on these important cautionary points and on the fundamental untestable dependency assumption made when fitting a log-linear model to CRC data, we hope to improve the quality of and transparency associated with subsequent surveillance-based CRC estimates of case counts. △ Less

Submitted 18 June, 2023; originally announced June 2023.

arXiv:2302.03558 [pdf]

doi 10.1080/00031305.2023.2250401

Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors

Authors: Lin Ge, Yuzi Zhang, Lance A. Waller, Robert H. Lyles

Abstract: Epidemiologic screening programs often make use of tests with small, but non-zero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use i… ▽ More Epidemiologic screening programs often make use of tests with small, but non-zero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of true cases, and that we apply an imperfect test with known sensitivity and specificity to a sample of individuals from the population. In this setting, we propose an enhanced inferential approach for use in conjunction with sampling-based bias-corrected prevalence estimation. While ignoring the finite nature of the population can yield markedly conservative estimates, direct application of a standard finite population correction (FPC) conversely leads to underestimation of variance. We uncover a way to leverage the typical FPC indirectly toward valid statistical inference. In particular, we derive a readily estimable extra variance component induced by misclassification in this specific but arguably common diagnostic testing scenario. Our approach yields a standard error estimate that properly captures the sampling variability of the usual bias-corrected maximum likelihood estimator of disease prevalence. Finally, we develop an adapted Bayesian credible interval for the true prevalence that offers improved frequentist properties (i.e., coverage and width) relative to a Wald-type confidence interval. We report the simulation results to demonstrate the enhanced performance of the proposed inferential methods. △ Less

Submitted 13 August, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

arXiv:2212.04911 [pdf]

doi 10.1093/aje/kwad177

A Design and Analytic Strategy for Monitoring Disease Positivity and Case Characteristics in Accessible Closed Populations

Authors: Robert H. Lyles, Yuzi Zhang, Lin Ge, Lance A. Waller

Abstract: We propose a monitoring strategy for efficient and robust estimation of disease prevalence and case numbers within closed and enumerated populations such as schools, workplaces, or retirement communities. The proposed design relies largely on voluntary testing, notoriously biased (e.g., in the case of COVID-19) due to non-representative sampling. The approach yields unbiased and comparatively prec… ▽ More We propose a monitoring strategy for efficient and robust estimation of disease prevalence and case numbers within closed and enumerated populations such as schools, workplaces, or retirement communities. The proposed design relies largely on voluntary testing, notoriously biased (e.g., in the case of COVID-19) due to non-representative sampling. The approach yields unbiased and comparatively precise estimates with no assumptions about factors underlying selection of individuals for voluntary testing, building on the strength of what can be a small random sampling component. This component unlocks a previously proposed "anchor stream" estimator, a well-calibrated alternative to classical capture-recapture (CRC) estimators based on two data streams. We show here that this estimator is equivalent to a direct standardization based on "capture", i.e., selection (or not) by the voluntary testing program, made possible by means of a key parameter identified by design. This equivalency simultaneously allows for novel two-stream CRC-like estimation of general means (e.g., of continuous variables such as antibody levels or biomarkers). For inference, we propose adaptations of a Bayesian credible interval when estimating case counts and bootstrapping when estimating means of continuous variables. We use simulations to demonstrate significant precision benefits relative to random sampling alone. △ Less

Submitted 9 December, 2022; originally announced December 2022.

arXiv:2211.13842 [pdf, ps, other]

doi 10.1002/sim.9759

Tailoring Capture-Recapture Methods to Estimate Registry-Based Case Counts Based on Error-Prone Diagnostic Signals

Authors: Lin Ge, Yuzi Zhang, Kevin C. Ward, Timothy L. Lash, Lance A. Waller, Robert H. Lyles

Abstract: Surveillance research is of great importance for effective and efficient epidemiological monitoring of case counts and disease prevalence. Taking specific motivation from ongoing efforts to identify recurrent cases based on the Georgia Cancer Registry, we extend recently proposed "anchor stream" sampling design and estimation methodology. Our approach offers a more efficient and defensible alterna… ▽ More Surveillance research is of great importance for effective and efficient epidemiological monitoring of case counts and disease prevalence. Taking specific motivation from ongoing efforts to identify recurrent cases based on the Georgia Cancer Registry, we extend recently proposed "anchor stream" sampling design and estimation methodology. Our approach offers a more efficient and defensible alternative to traditional capture-recapture (CRC) methods by leveraging a relatively small random sample of participants whose recurrence status is obtained through a principled application of medical records abstraction. This sample is combined with one or more existing signaling data streams, which may yield data based on arbitrarily non-representative subsets of the full registry population. The key extension developed here accounts for the common problem of false positive or negative diagnostic signals from the existing data stream(s). In particular, we show that the design only requires documentation of positive signals in these non-anchor surveillance streams, and permits valid estimation of the true case count based on an estimable positive predictive value (PPV) parameter. We borrow ideas from the multiple imputation paradigm to provide accompanying standard errors, and develop an adapted Bayesian credible interval approach that yields favorable frequentist coverage properties. We demonstrate the benefits of the proposed methods through simulation studies, and provide a data example targeting estimation of the breast cancer recurrence case count among Metro Atlanta area patients from the Georgia Cancer Registry-based Cancer Recurrence Information and Surveillance Program (CRISP) database. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2209.04316 [pdf, other]

Impacts of Census Differential Privacy for Small-Area Disease Mapping to Monitor Health Inequities

Authors: Yanran Li, Brent A. Coull, Nancy Krieger, Emily Peterson, Lance A. Waller, Jarvis T. Chen, Rachel C. Nethery

Abstract: The US Census Bureau will implement a new privacy-preserving disclosure avoidance system (DAS), which includes application of differential privacy, on the public-release 2020 census data. There are concerns that the DAS may bias small-area and demographically-stratified population counts, which play a critical role in public health research and policy, serving as denominators in estimation of dise… ▽ More The US Census Bureau will implement a new privacy-preserving disclosure avoidance system (DAS), which includes application of differential privacy, on the public-release 2020 census data. There are concerns that the DAS may bias small-area and demographically-stratified population counts, which play a critical role in public health research and policy, serving as denominators in estimation of disease/mortality rates. Employing three DAS demonstration products, we quantify errors attributable to reliance on DAS-protected denominators in standard small-area disease mapping models for characterizing health inequities. We conduct simulation studies and real data analyses of inequities in premature mortality at the census tract level in Massachusetts. Results show that overall patterns of inequity by racialized group and economic deprivation level are not compromised by the DAS. While early versions of DAS induce errors in mortality rate estimation that are larger for Black than for non-Hispanic white populations, this issue is ameliorated in newer DAS versions. △ Less

Submitted 29 March, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

arXiv:2112.09813 [pdf, other]

A Bayesian hierarchical small-area population model accounting for data source specific methodologies from American Community Survey, Population Estimates Program, and Decennial Census data

Authors: Emily N Peterson, Rachel C Nethery, Tullia Padellini, Jarvis T Chen, Brent A Coull, Frederic B Piel, Jon Wakefield, Marta Blangiardo, Lance A Waller

Abstract: Small area estimates of population are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area estimates of population counts are published by the United States Census Bureau (USCB) in the form of the Decennial census counts, Intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although… ▽ More Small area estimates of population are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area estimates of population counts are published by the United States Census Bureau (USCB) in the form of the Decennial census counts, Intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these data sources, there are important contrasts in data collection and processing methodologies, such that each set of estimates may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Resulting small area disease/mortality rates may differ depending on which data source is used for population counts (denominator data). To accurately capture annual small area population counts, and associated uncertainties, we present a Bayesian population model (B-Pop), which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. The main features of our framework are: 1) a single model integrating multiple data sources, 2) accounting for data source specific data generating mechanisms, and specifically accounting for data source specific errors, and 3) prediction of estimates for years without USCB reported data. We focus our study on the 159 counties of Georgia, and produce estimates for years 2005-2021. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2110.09272 [pdf]

Multi-Objective Allocation of COVID-19 Testing Centers: Improving Coverage and Equity in Access

Authors: Zhen Zhong, Ribhu Sengupta, Kamran Paynabar, Lance A. Waller

Abstract: At the time of this article, COVID-19 has been transmitted to more than 42 million people and resulted in more than 673,000 deaths across the United States. Throughout this pandemic, public health authorities have monitored the results of diagnostic testing to identify hotspots of transmission. Such information can help reduce or block transmission paths of COVID-19 and help infected patients rece… ▽ More At the time of this article, COVID-19 has been transmitted to more than 42 million people and resulted in more than 673,000 deaths across the United States. Throughout this pandemic, public health authorities have monitored the results of diagnostic testing to identify hotspots of transmission. Such information can help reduce or block transmission paths of COVID-19 and help infected patients receive early treatment. However, most current schemes of test site allocation have been based on experience or convenience, often resulting in low efficiency and non-optimal allocation. In addition, the historical sociodemographic patterns of populations within cities can result in measurable inequities in access to testing between various racial and income groups. To address these pressing issues, we propose a novel test site allocation scheme to (a) maximize population coverage, (b) minimize prediction uncertainties associated with projections of outbreak trajectories, and (c) reduce inequities in access. We illustrate our approach with case studies comparing our allocation scheme with recorded allocation of testing sites in Georgia, revealing increases in both population coverage and improvements in equity of access over current practice. △ Less

Submitted 20 September, 2021; originally announced October 2021.

arXiv:2101.01235 [pdf, other]

doi 10.1093/jrsssa/qnac013

An integrated abundance model for estimating county-level prevalence of opioid misuse in Ohio

Authors: Staci A. Hepler, David Kline, Andrea Bonny, Erin McKnight, Lance A. Waller

Abstract: Opioid misuse is a national epidemic and a significant drug related threat to the United States. While the scale of the problem is undeniable, estimates of the local prevalence of opioid misuse are lacking, despite their importance to policy-making and resource allocation. This is due, in part, to the challenge of directly measuring opioid misuse at a local level. In this paper, we develop a Bayes… ▽ More Opioid misuse is a national epidemic and a significant drug related threat to the United States. While the scale of the problem is undeniable, estimates of the local prevalence of opioid misuse are lacking, despite their importance to policy-making and resource allocation. This is due, in part, to the challenge of directly measuring opioid misuse at a local level. In this paper, we develop a Bayesian hierarchical spatio-temporal abundance model that integrates indirect county-level data on opioid-related outcomes with state-level survey estimates on prevalence of opioid misuse to estimate the latent county-level prevalence and counts of people who misuse opioids. A simulation study shows that our integrated model accurately recovers the latent counts and prevalence. We apply our model to county-level surveillance data on opioid overdose deaths and treatment admissions from the state of Ohio. Our proposed framework can be applied to other applications of small area estimation for hard to reach populations, which is a common occurrence with many health conditions such as those related to illicit behaviors. △ Less

Submitted 12 January, 2022; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: * Authors Hepler and Kline contributed equally

MSC Class: 62

Journal ref: Journal of the Royal Statistical Society Series A: Statistics in Society. 2023;186(1):43-60

arXiv:1609.00141 [pdf, other]

Data Integration Model for Air Quality: A Hierarchical Approach to the Global Estimation of Exposures to Ambient Air Pollution

Authors: Gavin Shaddick, Matthew L. Thomas, Amelia Jobling, Michael Brauer, Aaron van Donkelaar, Rick Burnett, Howard Chang, Aaron Cohen, Rita Van Dingenen, Carlos Dora, Sophie Gumy, Yang Liu, Randall Martin, Lance A. Waller, Jason West, James V. Zidek, Annette Prüss-Ustün

Abstract: Air pollution is a major risk factor for global health, with both ambient and household air pollution contributing substantial components of the overall global disease burden. One of the key drivers of adverse health effects is fine particulate matter ambient pollution (PM$_{2.5}$) to which an estimated 3 million deaths can be attributed annually. The primary source of information for estimating e… ▽ More Air pollution is a major risk factor for global health, with both ambient and household air pollution contributing substantial components of the overall global disease burden. One of the key drivers of adverse health effects is fine particulate matter ambient pollution (PM$_{2.5}$) to which an estimated 3 million deaths can be attributed annually. The primary source of information for estimating exposures has been measurements from ground monitoring networks but, although coverage is increasing, there remain regions in which monitoring is limited. Ground monitoring data therefore needs to be supplemented with information from other sources, such as satellite retrievals of aerosol optical depth and chemical transport models. A hierarchical modelling approach for integrating data from multiple sources is proposed allowing spatially-varying relationships between ground measurements and other factors that estimate air quality. Set within a Bayesian framework, the resulting Data Integration Model for Air Quality (DIMAQ) is used to estimate exposures, together with associated measures of uncertainty, on a high resolution grid covering the entire world. Bayesian analysis on this scale can be computationally challenging and here approximate Bayesian inference is performed using Integrated Nested Laplace Approximations. Model selection and assessment is performed by cross-validation with the final model offering substantial increases in predictive accuracy, particularly in regions where there is sparse ground monitoring, when compared to current approaches: root mean square error (RMSE) reduced from 17.1 to 10.7, and population weighted RMSE from 23.1 to 12.1 $μ$gm$^{-3}$. Based on summaries of the posterior distributions for each grid cell, it is estimated that 92% of the world's population reside in areas exceeding the World Health Organization's Air Quality Guidelines. △ Less

Submitted 26 September, 2016; v1 submitted 1 September, 2016; originally announced September 2016.

Comments: 23 pages, 9 figures, 2 tables

arXiv:1602.04528 [pdf, other]

doi 10.1214/17-AOAS1068

Hierarchical multivariate space-time methods for modeling counts with an application to stroke mortality data

Authors: Harrison Quick, Lance A. Waller, Michele Casper

Abstract: Geographic patterns in stroke mortality have been studied as far back as the 1960s, when a region of the southeastern United States became known as the "stroke belt" due to its unusually high rates of stroke mortality. While stroke mortality rates are known to increase exponentially with age, an investigation of spatiotemporal trends by age group at the county-level is daunting due to the preponde… ▽ More Geographic patterns in stroke mortality have been studied as far back as the 1960s, when a region of the southeastern United States became known as the "stroke belt" due to its unusually high rates of stroke mortality. While stroke mortality rates are known to increase exponentially with age, an investigation of spatiotemporal trends by age group at the county-level is daunting due to the preponderance of small population sizes and/or few stroke events by age group. Here, we harness the power of a complex, nonseparable multivariate space-time model which borrows strength across space, time, and age group to obtain reliable estimates of yearly county-level mortality rates from US counties between 1973 and 2013 for those aged 65+. Furthermore, we propose an alternative metric for measuring changes in event rates over time which accounts for the full trajectory of a county's event rates, as opposed to simply comparing the rates at the beginning and end of the study period. In our analysis of the stroke data, we identify differing spatiotemporal trends in mortality rates across age groups, shed light on the gains achieved in the Deep South, and provide evidence that a separable model is inappropriate for these data. △ Less

Submitted 14 February, 2016; originally announced February 2016.

Journal ref: Annals of Applied Statistics, 11 (2017) 2170-2182

arXiv:1507.02741 [pdf, other]

doi 10.1111/rssc.12215

A Nonseparable Multivariate Space-Time Model for Analyzing County-Level Heart Disease Death Rates by Race and Gender

Authors: Harrison Quick, Lance A. Waller, Michele Casper

Abstract: While death rates due to diseases of the heart have experienced a sharp decline over the past 50 years, these diseases continue to be the leading cause of death in the United States, and the rate of decline varies by geographic location, race, and gender. We look to harness the power of hierarchical Bayesian methods to obtain a clearer picture of the declines from county-level, temporally varying… ▽ More While death rates due to diseases of the heart have experienced a sharp decline over the past 50 years, these diseases continue to be the leading cause of death in the United States, and the rate of decline varies by geographic location, race, and gender. We look to harness the power of hierarchical Bayesian methods to obtain a clearer picture of the declines from county-level, temporally varying heart disease death rates for men and women of different races in the US. Specifically, we propose a nonseparable multivariate spatio-temporal Bayesian model which allows for group-specific temporal correlations and temporally-evolving covariance structures in the multivariate spatio-temporal component of the model. After verifying the effectiveness of our model via simulation, we apply our model to a dataset of over 200,000 county-level heart disease death rates. In addition to yielding a superior fit than other common approaches for handling such data, the richness of our model provides insight into racial, gender, and geographic disparities underlying heart disease death rates in the US which are not permitted by more restrictive models. △ Less

Submitted 9 July, 2015; originally announced July 2015.

Journal ref: Journal of the Royal Statistical Society; Series C, 67 (2018) 291-304

arXiv:1501.03885 [pdf, ps, other]

doi 10.1214/14-AOAS728C

Discussion of "Spatial accessibility of pediatric primary healthcare: Measurement and inference"

Authors: Lance A. Waller

Abstract: Discussion of "Spatial accessibility of pediatric primary healthcare: Measurement and inference" by Mallory Nobles, Nicoleta Serban and Julie Swann [arXiv:1501.03626]. Discussion of "Spatial accessibility of pediatric primary healthcare: Measurement and inference" by Mallory Nobles, Nicoleta Serban and Julie Swann [arXiv:1501.03626]. △ Less

Submitted 16 January, 2015; originally announced January 2015.

Comments: Published in at http://dx.doi.org/10.1214/14-AOAS728C the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS728C

Journal ref: Annals of Applied Statistics 2014, Vol. 8, No. 4, 1956-1960

Showing 1–18 of 18 results for author: Waller, L A