Search | arXiv e-print repository

A joint model for DHS and MICS surveys: Spatial modeling with anonymized locations

Authors: John Paige, Geir-Arne Fuglstad, Andrea Riebler

Abstract: Anonymizing the GPS locations of observations can bias a spatial model's parameter estimates and attenuate spatial predictions when improperly accounted for, and is relevant in applications from public health to paleoseismology. In this work, we demonstrate that a newly introduced method for geostatistical modeling in the presence of anonymized point locations can be extended to account for more g… ▽ More Anonymizing the GPS locations of observations can bias a spatial model's parameter estimates and attenuate spatial predictions when improperly accounted for, and is relevant in applications from public health to paleoseismology. In this work, we demonstrate that a newly introduced method for geostatistical modeling in the presence of anonymized point locations can be extended to account for more general kinds of positional uncertainty due to location anonymization, including both jittering (a form of random perturbations of GPS coordinates) and geomasking (reporting only the name of the area containing the true GPS coordinates). We further provide a numerical integration scheme that flexibly accounts for the positional uncertainty as well as spatial and covariate information. We apply the method to women's secondary education completion data in the 2018 Nigeria demographic and health survey (NDHS) containing jittered point locations, and the 2016 Nigeria multiple indicator cluster survey (NMICS) containing geomasked locations. We show that accounting for the positional uncertainty in the surveys can improve predictions in terms of their continuous rank probability score. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: main manuscript: 31 pages, 6 figures, 2 tables; supplemental materials: 10 pages, 4 figures, 7 tables

arXiv:2309.02093 [pdf, other]

Estimating Subnational Under-Five Mortality Rates Using a Spatio-Temporal Age-Period-Cohort Model

Authors: Connor Gascoigne, Theresa Smith, John Paige, Jon Wakefield

Abstract: Producing subnational estimates of the under-five mortality rate (U5MR) is a vital goal for the United Nations to reduce inequalities in mortality and well-being across the globe. There is a great disparity in U5MR between high-income and Low-and-Middle Income Countries (LMICs). Current methods for modelling U5MR in LMICs use smoothing methods to reduce uncertainty in estimates caused by data spar… ▽ More Producing subnational estimates of the under-five mortality rate (U5MR) is a vital goal for the United Nations to reduce inequalities in mortality and well-being across the globe. There is a great disparity in U5MR between high-income and Low-and-Middle Income Countries (LMICs). Current methods for modelling U5MR in LMICs use smoothing methods to reduce uncertainty in estimates caused by data sparsity. This paper includes cohort alongside age and period in a novel application of an Age-Period-Cohort model for U5MR. In this context, current methods only use age and period (and not cohort) for smoothing. With data from the Kenyan Demographic and Health Surveys (DHS) we use a Bayesian hierarchical model with terms to smooth over temporal and spatial components whilst fully accounting for the complex stratified, multi-staged cluster design of the DHS. Our results show that the use of cohort may be useful in the context of subnational estimates of U5MR. We validate our results at the subnational level by comparing our results against direct estimates. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: 16 pages, 6 Figures, 2 Tables

arXiv:2303.12668 [pdf, other]

GeoAdjust: Adjusting for Positional Uncertainty in Geostatistial Analysis of DHS Data

Authors: Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad

Abstract: The R-package GeoAdjust https://github.com/umut-altay/GeoAdjust-package implements fast empirical Bayesian geostatistical inference for household survey data from the Demographic and Health Surveys Program (DHS) using Template Model Builder (TMB). DHS household survey data is an important source of data for tracking demographic and health indicators, but positional uncertainty has been intentional… ▽ More The R-package GeoAdjust https://github.com/umut-altay/GeoAdjust-package implements fast empirical Bayesian geostatistical inference for household survey data from the Demographic and Health Surveys Program (DHS) using Template Model Builder (TMB). DHS household survey data is an important source of data for tracking demographic and health indicators, but positional uncertainty has been intentionally introduced in the GPS coordinates to preserve privacy. GeoAdjust accounts for such positional uncertainty in geostatistical models containing both spatial random effects and raster- and distance-based covariates. The R package supports Gaussian, binomial and Poisson likelihoods with identity link, logit link, and log link functions respectively. The user defines the desired model structure by setting a small number of function arguments, and can easily experiment with different hyperparameters for the priors. GeoAdjust is the first software package that is specifically designed to address positional uncertainty in the GPS coordinates of point referenced household survey data. The package provides inference for model parameters and can predict values at unobserved locations. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2211.07442 [pdf, other]

Impact of Jittering on Raster- and Distance-based Geostatistical Analyses of DHS Data

Authors: Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad

Abstract: Fine-scale covariate rasters are routinely used in geostatistical models for mapping demographic and health indicators based on household surveys from the Demographic and Health Surveys (DHS) program. However, the geostatistical analyses ignore the fact that GPS coordinates in DHS surveys are jittered for privacy purposes. We demonstrate the need to account for this jittering, and we propose a com… ▽ More Fine-scale covariate rasters are routinely used in geostatistical models for mapping demographic and health indicators based on household surveys from the Demographic and Health Surveys (DHS) program. However, the geostatistical analyses ignore the fact that GPS coordinates in DHS surveys are jittered for privacy purposes. We demonstrate the need to account for this jittering, and we propose a computationally efficient approach that can be routinely applied. We use the new method to analyse the prevalence of completion of secondary education for 20--49 year old women in Nigeria in 2018 based on the 2018 DHS survey. The analysis demonstrates substantial changes in the estimates of spatial range and fixed effects compared to when we ignore jittering. Through a simulation study that mimics the dataset, we demonstrate that accounting for jittering reduces attenuation in the estimated coefficients for covariates and improves predictions. The results also show that the common approach of averaging covariate values in windows around the observed locations does not lead to the same improvements as accounting for jittering. △ Less

Submitted 22 August, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

arXiv:2207.06700 [pdf, other]

Spatial Aggregation with Respect to a Population Distribution

Authors: John Paige, Geir-Arne Fuglstad, Andrea Riebler, Jon Wakefield

Abstract: Spatial aggregation with respect to a population distribution involves estimating aggregate quantities for a population based on an observation of individuals in a subpopulation. In this context, a geostatistical workflow must account for three major sources of `aggregation error': aggregation weights, fine scale variation, and finite population variation. However, common practice is to treat the… ▽ More Spatial aggregation with respect to a population distribution involves estimating aggregate quantities for a population based on an observation of individuals in a subpopulation. In this context, a geostatistical workflow must account for three major sources of `aggregation error': aggregation weights, fine scale variation, and finite population variation. However, common practice is to treat the unknown population distribution as a known population density and ignore empirical variability in outcomes. We improve common practice by introducing a `sampling frame model' that allows aggregation models to account for the three sources of aggregation error simply and transparently. We compare the proposed and the traditional approach using two simulation studies that mimic neonatal mortality rate (NMR) data from the 2014 Kenya Demographic and Health Survey (KDHS2014). For the traditional approach, undercoverage/overcoverage depends arbitrarily on the aggregation grid resolution, while the new approach exhibits low sensitivity. The differences between the two aggregation approaches increase as the population of an area decreases. The differences are substantial at the second administrative level and finer, but also at the first administrative level for some population quantities. We find differences between the proposed and traditional approach are consistent with those we observe in an application to NMR data from the KDHS2014. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: main manuscript: 33 pages, 5 figures, 5 tables; supplemental materials: 15 pages, 2 figures, 15 tables

arXiv:2202.11035 [pdf, other]

Fast geostatistical inference under positional uncertainty: Analysing DHS household survey data

Authors: Umut Altay, John Paige, Andrea Riebler, Geir-Arne Fuglstad

Abstract: Household survey data from the Demographic and Health Surveys (DHS) Program is published with GPS coordinates. However, almost all geostatistical analyses of such data ignore that the published GPS coordinates are randomly displaced (jittered). In this short report, we develop a geostatistical model that accounts for the positional uncertainty when analysing DHS surveys, and provide a fast impleme… ▽ More Household survey data from the Demographic and Health Surveys (DHS) Program is published with GPS coordinates. However, almost all geostatistical analyses of such data ignore that the published GPS coordinates are randomly displaced (jittered). In this short report, we develop a geostatistical model that accounts for the positional uncertainty when analysing DHS surveys, and provide a fast implementation using Template Model Builder. The key focus is inference with Gaussian random fields under positional uncertainty, and our approach works for both Gaussian and non-Gaussian likelihoods. A simulation study with a binomial observation model shows that the new approach performs equally or better than the common approach of ignoring jittering, both in terms of more accurate parameter estimates and improved predictive measures. We demonstrate that the improvement would be larger under stronger jittering. An analysis of contraceptive use in Kenya shows that the approach is fast and easy to use in practice. △ Less

Submitted 2 December, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

arXiv:2007.05117 [pdf, other]

Space-Time Smoothing of Survey Outcomes using the R Package SUMMER

Authors: Zehang Richard Li, Bryan D Martin, Tracy Qi Dong, Geir-Arne Fuglstad, John Paige, Andrea Riebler, Samuel Clark, Jon Wakefield

Abstract: The increasing availability of complex survey data, and the continued need for estimates of demographic and health indicators at a fine spatial and temporal scale, which leads to issues of data sparsity, has led to the need for spatio-temporal smoothing methods that acknowledge the manner in which the data were collected. The open source R package SUMMER implements a variety of methods for spatial… ▽ More The increasing availability of complex survey data, and the continued need for estimates of demographic and health indicators at a fine spatial and temporal scale, which leads to issues of data sparsity, has led to the need for spatio-temporal smoothing methods that acknowledge the manner in which the data were collected. The open source R package SUMMER implements a variety of methods for spatial or spatio-temporal smoothing of survey data. The emphasis is on small-area estimation. We focus primarily on indicators in a low and middle-income countries context. Our methods are particularly useful for data from Demographic Health Surveys and Multiple Indicator Cluster Surveys. We build upon functions within the survey package, and use INLA for fast Bayesian computation. This paper includes a brief overview of these methods and illustrates the workflow of accessing and processing surveys, estimating subnational child mortality rates, and visualizing results with both simulated data and DHS surveys. △ Less

Submitted 8 January, 2025; v1 submitted 9 July, 2020; originally announced July 2020.

arXiv:2005.11805 [pdf, other]

Bayesian Multiresolution Modeling Of Georeferenced Data

Authors: John Paige, Geir-Arne Fuglstad, Andrea Riebler, Jon Wakefield

Abstract: Current implementations of multiresolution methods are limited in terms of possible types of responses and approaches to inference. We provide a multiresolution approach for spatial analysis of non-Gaussian responses using latent Gaussian models and Bayesian inference via integrated nested Laplace approximation (INLA). The approach builds on `LatticeKrig', but uses a reparameterization of the mode… ▽ More Current implementations of multiresolution methods are limited in terms of possible types of responses and approaches to inference. We provide a multiresolution approach for spatial analysis of non-Gaussian responses using latent Gaussian models and Bayesian inference via integrated nested Laplace approximation (INLA). The approach builds on `LatticeKrig', but uses a reparameterization of the model parameters that is intuitive and interpretable so that modeling and prior selection can be guided by expert knowledge about the different spatial scales at which dependence acts. The priors can be used to make inference robust and integration over model parameters allows for more accurate posterior estimates of uncertainty. The extended LatticeKrig (ELK) model is compared to a standard implementation of LatticeKrig (LK), and a standard Matérn model, and we find modest improvement in spatial oversmoothing and prediction for the ELK model for counts of secondary education completion for women in Kenya collected in the 2014 Kenya demographic health survey. Through a simulation study with Gaussian responses and a realistic mix of short and long scale dependencies, we demonstrate that the differences between the three approaches for prediction increases with distance to nearest observation. △ Less

Submitted 25 May, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

Comments: main manuscript: 33 pages, 7 figures, 2 tables; supplemental materials: 9 pages, 3 figures, 5 tables

arXiv:1910.06512 [pdf, other]

Design- and Model-Based Approaches to Small-Area Estimation in a Low and Middle Income Country Context: Comparisons and Recommendations

Authors: John Paige, Geir-Arne Fuglstad, Andrea Riebler, Jon Wakefield

Abstract: The need for rigorous and timely health and demographic summaries has provided the impetus for an explosion in geographic studies, with a common approach being the production of pixel-level maps, particularly in low and middle income countries. In this context, household surveys are a major source of data, usually with a two-stage cluster design with stratification by region and urbanicity. Accura… ▽ More The need for rigorous and timely health and demographic summaries has provided the impetus for an explosion in geographic studies, with a common approach being the production of pixel-level maps, particularly in low and middle income countries. In this context, household surveys are a major source of data, usually with a two-stage cluster design with stratification by region and urbanicity. Accurate estimates are of crucial interest for precision public health policy interventions, but many current studies take a cavalier approach to acknowledging the sampling design, while presenting results at a fine geographic scale. In this paper we investigate the extent to which accounting for sample design can affect predictions at the aggregate level, which is usually the target of inference. We describe a simulation study in which realistic sampling frames are created for Kenya, based on population and demographic information, with a survey design that mimics a Demographic Health Survey (DHS). We compare the predictive performance of various commonly-used models. We also describe a cluster level model with a discrete spatial smoothing prior that has not been previously used, but provides reliable inference. We find that including stratification and cluster level random effects can improve predictive performance. Spatially smoothed direct (weighted) estimates were robust to priors and survey design. Continuous spatial models performed well in the presence of fine scale variation; however, these models require the most "hand holding". Subsequently, we examine how the models perform on real data; specifically we model the prevalence of secondary education for women aged 20-29 using data from the 2014 Kenya DHS. △ Less

Submitted 14 October, 2019; originally announced October 2019.

Comments: Main text: 35 pages, 5 figures, 2 tables. Supplementary materials: 63 pages, 5 figures, 21 tables

Showing 1–9 of 9 results for author: Paige, J