Search | arXiv e-print repository

Joint space-time modelling for upper daily maximum and minimum temperature record-breaking

Authors: Jorge Castillo-Mateo, Zeus Gracia-Tabuenca, Jesús Asín, Ana C. Cebrián, Alan E. Gelfand

Abstract: Record-breaking temperature events are now frequently in the news, proffered as evidence of climate change, and often bring significant economic and human impacts. Our previous work undertook the first substantial spatial modelling investigation of temperature record-breaking across years for any given day within the year, employing a dataset consisting of over sixty years of daily maximum tempera… ▽ More Record-breaking temperature events are now frequently in the news, proffered as evidence of climate change, and often bring significant economic and human impacts. Our previous work undertook the first substantial spatial modelling investigation of temperature record-breaking across years for any given day within the year, employing a dataset consisting of over sixty years of daily maximum temperatures across peninsular Spain. That dataset also supplies daily minimum temperatures (which, in fact, are now available through 2023). Here, the dataset is converted into a daily pair of binary events, indicators, for that day, of whether a yearly record was broken for the daily maximum temperature and/or for the daily minimum temperature. Joint modelling addresses several inference issues: (i) defining/modelling record-breaking with bivariate time series of yearly indicators, (ii) strength of relationship between record-breaking events, (iii) prediction of joint, conditional and marginal record-breaking, (iv) persistence in record-breaking across days, (v) spatial interpolation across peninsular Spain. We substantially expand our previous work to enable investigation of these issues. We observe strong correlation between both processes but a growing trend of climate change that is well differentiated between them both spatially and temporally as well as different strengths of persistence and spatial dependence. △ Less

Submitted 30 May, 2025; originally announced May 2025.

Comments: 31 pages (+25 pages supplement), 13 figures (+14 figures supplement), 3 tables (+4 tables supplement)

arXiv:2503.16151 [pdf, other]

On prior smoothing with discrete spatial data in the context of disease mapping

Authors: Garazi Retegui, Alan E. Gelfand, Jaione Etxeberria, María Dolores Ugarte

Abstract: Disease mapping attempts to explain observed health event counts across areal units, typically using Markov random field models. These models rely on spatial priors to account for variation in raw relative risk or rate estimates. Spatial priors introduce some degree of smoothing, wherein, for any particular unit, empirical risk or incidence estimates are either adjusted towards a suitable mean or… ▽ More Disease mapping attempts to explain observed health event counts across areal units, typically using Markov random field models. These models rely on spatial priors to account for variation in raw relative risk or rate estimates. Spatial priors introduce some degree of smoothing, wherein, for any particular unit, empirical risk or incidence estimates are either adjusted towards a suitable mean or incorporate neighbor-based smoothing. While model explanation may be the primary focus, the literature lacks a comparison of the amount of smoothing introduced by different spatial priors. Additionally, there has been no investigation into how varying the parameters of these priors influences the resulting smoothing. This study examines seven commonly used spatial priors through both simulations and real data analyses. Using areal maps of peninsular Spain and England, we analyze smoothing effects with two datasets with associated populations at risk. We propose empirical metrics to quantify the smoothing achieved by each model and theoretical metrics to calibrate the expected extent of smoothing as a function of model parameters. We employ areal maps in order to quantitatively characterize the extent of smoothing within and across the models as well as to link the theoretical metrics to the empirical metrics. △ Less

Submitted 20 March, 2025; originally announced March 2025.

arXiv:2411.06001 [pdf, other]

Joint Spatiotemporal Modeling of Zooplankton and Whale Abundance in a Dynamic Marine Environment

Authors: Bokgyeong Kang, Erin M. Schliep, Alan E. Gelfand, Christopher W. Clark, Christine A. Hudak, Charles A. Mayo, Ryan Schosberg, Tina M. Yack, Robert S. Schick

Abstract: North Atlantic right whales are an endangered species; their entire population numbers approximately 372 individuals, and they are subject to major anthropogenic threats. They feed on zooplankton species whose distribution shifts in a dynamic and warming oceanic environment. Because right whales in turn follow their shifting food resource, it is necessary to jointly study the distribution of whale… ▽ More North Atlantic right whales are an endangered species; their entire population numbers approximately 372 individuals, and they are subject to major anthropogenic threats. They feed on zooplankton species whose distribution shifts in a dynamic and warming oceanic environment. Because right whales in turn follow their shifting food resource, it is necessary to jointly study the distribution of whales and their prey. The innovative joint species distribution modeling (JSDM) contribution here is different from anything in the large JDSM literature, reflecting the processes and data we have to work with. Specifically, our JSDM supplies a geostatistical model for expected amount of zooplankton collected at a site. We require a point pattern model for the intensity of right whale abundance. The two process models are joined through a latent conditional-marginal specification. Further, each species has two data sources to inform their respective distributions and these sources require novel data fusion. What emerges is a complex multi-level model. Through simulation we demonstrate the ability of our joint specification to identify model unknowns and learn better about the species distributions than modeling them individually. We then apply our modeling to real data from Cape Cod Bay, Massachusetts in the U.S. △ Less

Submitted 8 November, 2024; originally announced November 2024.

arXiv:2408.09557 [pdf, other]

Markov modeling for a satellite tag data record of whale diving behavior

Authors: Joshua Hewitt, Nicola J. Quick, Alan E. Gelfand, Robert S. Schick

Abstract: Cuvier's beaked whales (Ziphius cavirostris) are the deepest diving marine mammal, consistently diving to depths exceeding 1,000m for durations longer than an hour, making them difficult animals to study. They are important to study because they are sensitive to disturbances from naval sonar. Satellite-linked telemetry devices provide up to 14-day long records of dive behavior. However, the time s… ▽ More Cuvier's beaked whales (Ziphius cavirostris) are the deepest diving marine mammal, consistently diving to depths exceeding 1,000m for durations longer than an hour, making them difficult animals to study. They are important to study because they are sensitive to disturbances from naval sonar. Satellite-linked telemetry devices provide up to 14-day long records of dive behavior. However, the time series of depths is discretized to coarse bins due to bandwidth limitations. We analyze telemetry data from beaked whales that were exposed to moderate levels of sonar within controlled exposure experiments (CEEs) to study behavioral responses to sound exposure. We model the data as a hidden Markov model (HMM) over the time series of discrete depth bins, introducing partially observed movement types and recent diving activity covariates to model marginal non-stationarity. Movement types provide more flexible modeling for CEEs than partially observed dive stages, which are more commonly used in dive behavior HMMs. We estimate the proposed model within a hierarchical Bayesian framework, using HMM methods to compute marginalized likelihoods and posterior predictive distributions. We assess behavioral response by comparing observed post-exposure behavior to usual unexposed behavior via the posterior predictive distribution. The model quantifies patterns in baseline diving behavior and finds evidence that beaked whales deviate in response to sound. We find evidence that (i) beaked whales initially shorten the time they spend between deep dives, which may have physiological effects and (ii) subsequently avoid deep dives, which can result in lost foraging opportunities. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 23 pages, 7 figures, 1 table

arXiv:2404.12583 [pdf, other]

Analyzing whale calling through Hawkes process modeling

Authors: Bokgyeong Kang, Erin M. Schliep, Alan E. Gelfand, Tina M. Yack, Christopher W. Clark, Robert S. Schick

Abstract: Sound is assumed to be the primary modality of communication among marine mammal species. Analyzing acoustic recordings helps to understand the function of the acoustic signals as well as the possible impact of anthropogenic noise on acoustic behavior. Motivated by a dataset from a network of hydrophones in Cape Cod Bay, Massachusetts, utilizing automatically detected calls in recordings, we study… ▽ More Sound is assumed to be the primary modality of communication among marine mammal species. Analyzing acoustic recordings helps to understand the function of the acoustic signals as well as the possible impact of anthropogenic noise on acoustic behavior. Motivated by a dataset from a network of hydrophones in Cape Cod Bay, Massachusetts, utilizing automatically detected calls in recordings, we study the communication process of the endangered North Atlantic right whale. For right whales an "up-call" is known as a contact call, and ensuing counter-calling between individuals is presumed to facilitate group cohesion. We present novel spatiotemporal excitement modeling consisting of a background process and a counter-call process. The background process intensity incorporates the influences of diel patterns and ambient noise on occurrence. The counter-call intensity captures potential excitement, that calling elicits calling behavior. Call incidence is found to be clustered in space and time; a call seems to excite more calls nearer to it in time and space. We find evidence that whales make more calls during twilight hours, respond to other whales nearby, and are likely to remain quiet in the presence of increased ambient noise. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2403.00080 [pdf, other]

Spatio-temporal modeling for record-breaking temperature events in Spain

Authors: Jorge Castillo-Mateo, Alan E. Gelfand, Zeus Gracia-Tabuenca, Jesús Asín, Ana C. Cebrián

Abstract: Record-breaking temperature events are now very frequently in the news, viewed as evidence of climate change. With this as motivation, we undertake the first substantial spatial modeling investigation of temperature record-breaking across years for any given day within the year. We work with a dataset consisting of over sixty years (1960-2021) of daily maximum temperatures across peninsular Spain.… ▽ More Record-breaking temperature events are now very frequently in the news, viewed as evidence of climate change. With this as motivation, we undertake the first substantial spatial modeling investigation of temperature record-breaking across years for any given day within the year. We work with a dataset consisting of over sixty years (1960-2021) of daily maximum temperatures across peninsular Spain. Formal statistical analysis of record-breaking events is an area that has received attention primarily within the probability community, dominated by results for the stationary record-breaking setting with some additional work addressing trends. Such effort is inadequate for analyzing actual record-breaking data. Effective analysis requires rich modeling of the indicator events which define record-breaking sequences. Resulting from novel and detailed exploratory data analysis, we propose hierarchical conditional models for the indicator events. After suitable model selection, we discover explicit trend behavior, necessary autoregression, significance of distance to the coast, useful interactions, helpful spatial random effects, and very strong daily random effects. Illustratively, the model estimates that global warming trends have increased the number of records expected in the past decade almost two-fold, 1.93 (1.89,1.98), but also estimates highly differentiated climate warming rates in space and by season. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 25 pages (+23 pages supplement), 7 figures (+14 figures supplement), 2 tables (+8 tables supplement)

arXiv:2310.08397 [pdf, other]

Assessing Marine Mammal Abundance: A Novel Data Fusion

Authors: Erin M. Schliep, Alan E. Gelfand, Christopher W. Clark, Charles M. Mayo, Brigid McKenna, Susan E. Parks, Tina M. Yack, Robert S. Schick

Abstract: Marine mammals are increasingly vulnerable to human disturbance and climate change. Their diving behavior leads to limited visual access during data collection, making studying the abundance and distribution of marine mammals challenging. In theory, using data from more than one observation modality should lead to better informed predictions of abundance and distribution. With focus on North Atlan… ▽ More Marine mammals are increasingly vulnerable to human disturbance and climate change. Their diving behavior leads to limited visual access during data collection, making studying the abundance and distribution of marine mammals challenging. In theory, using data from more than one observation modality should lead to better informed predictions of abundance and distribution. With focus on North Atlantic right whales, we consider the fusion of two data sources to inform about their abundance and distribution. The first source is aerial distance sampling which provides the spatial locations of whales detected in the region. The second source is passive acoustic monitoring (PAM), returning calls received at hydrophones placed on the ocean floor. Due to limited time on the surface and detection limitations arising from sampling effort, aerial distance sampling only provides a partial realization of locations. With PAM, we never observe numbers or locations of individuals. To address these challenges, we develop a novel thinned point pattern data fusion. Our approach leads to improved inference regarding abundance and distribution of North Atlantic right whales throughout Cape Cod Bay, Massachusetts in the US. We demonstrate performance gains of our approach compared to that from a single source through both simulation and real data. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2305.19080 [pdf, other]

doi 10.1007/s11749-023-00895-6

Bayesian joint quantile autoregression

Authors: Jorge Castillo-Mateo, Alan E. Gelfand, Jesús Asín, Ana C. Cebrián, Jesús Abaurrea

Abstract: Quantile regression continues to increase in usage, providing a useful alternative to customary mean regression. Primary implementation takes the form of so-called multiple quantile regression, creating a separate regression for each quantile of interest. However, recently, advances have been made in joint quantile regression, supplying a quantile function which avoids crossing of the regression a… ▽ More Quantile regression continues to increase in usage, providing a useful alternative to customary mean regression. Primary implementation takes the form of so-called multiple quantile regression, creating a separate regression for each quantile of interest. However, recently, advances have been made in joint quantile regression, supplying a quantile function which avoids crossing of the regression across quantiles. Here, we turn to quantile autoregression (QAR), offering a fully Bayesian version. We extend the initial quantile regression work of Koenker and Xiao (2006) in the spirit of Tokdar and Kadane (2012). We offer a directly interpretable parametric model specification for QAR. Further, we offer a p-th order QAR(p) version, a multivariate QAR(1) version, and a spatial QAR(1) version. We illustrate with simulation as well as a temperature dataset collected in Aragón, Spain. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 21 pages (+18 pages supplement), 8 figures (+15 figures supplement), 1 table (+6 tables supplement)

MSC Class: 62F15; 62G08; 62H05; 62M10; 62M30

Journal ref: TEST 33(1), 335-357 (2024)

arXiv:2211.10784 [pdf, other]

doi 10.1002/joc.8305

Model-based tools for assessing space and time change in daily maximum temperature: an application to the Ebro basin in Spain

Authors: Ana C. Cebrián, Jesús Asín, Jorge Castillo-Mateo, Alan E. Gelfand, Jesús Abaurrea

Abstract: There is continuing interest in the investigation of change in temperature over space and time. We offer a set of tools to illuminate such change temporally, at desired temporal resolution, and spatially, according to region of interest, using data generated from suitable space-time models. These tools include predictive spatial probability surfaces and spatial extents for an event. Working with e… ▽ More There is continuing interest in the investigation of change in temperature over space and time. We offer a set of tools to illuminate such change temporally, at desired temporal resolution, and spatially, according to region of interest, using data generated from suitable space-time models. These tools include predictive spatial probability surfaces and spatial extents for an event. Working with exceedance events around the center of the temperature distribution, the probability surfaces capture the spatial variation in the risk of an exceedance event, while the spatial extents capture the expected proportion of incidence of a given exceedance event for a region of interest. Importantly, the proposed tools can be used with the output from any suitable model fitted to any set of spatially referenced time series data. As an illustration, we employ a dataset from 1956 to 2015 collected at 18 stations over Aragón in Spain, and a collection of daily maximum temperature series obtained from posterior predictive simulation of a Bayesian hierarchical daily temperature model. The results for the summer period show that although there is an increasing risk in all the events used to quantify the effects of climate change, it is not spatially homogeneous, with the largest increase arising in the center of Ebro valley and Eastern Pyrenees area. The risk of an increase of the average temperature between 1966-1975 and 2006-2015 higher than $1^\circ$C is higher than 0.5 all over the region, and close to 1 in the previous areas. The extent of daily temperature higher than the reference mean has increased 3.5% per decade. The mean of the extent indicates that 95% of the area under study has suffered a positive increment of the average temperature, and almost 70% higher than $1^{\circ}$C. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: 23 pages main manuscript and 7 pages supplement

Journal ref: International Journal of Climatology 43(16), 8036-8051 (2023)

arXiv:2210.00409 [pdf, other]

Joint Multivariate and Functional Modeling for Plant Traits and Reflectances

Authors: Philip A. White, Michael F. Christensen, Henry Frye, Alan E. Gelfand, John A. Silander Jr

Abstract: The investigation of leaf-level traits in response to varying environmental conditions has immense importance for understanding plant ecology. Remote sensing technology enables measurement of the reflectance of plants to make inferences about underlying traits along environmental gradients. While much focus has been placed on understanding how reflectance and traits are related at the leaf-level,… ▽ More The investigation of leaf-level traits in response to varying environmental conditions has immense importance for understanding plant ecology. Remote sensing technology enables measurement of the reflectance of plants to make inferences about underlying traits along environmental gradients. While much focus has been placed on understanding how reflectance and traits are related at the leaf-level, the challenge of modelling the dependence of this relationship along environmental gradients has limited this line of inquiry. Here, we take up the problem of jointly modeling traits and reflectance given environment. Our objective is to assess not only response to environmental regressors but also dependence between trait levels and the reflectance spectrum in the context of this regression. This leads to joint modeling of a response vector of traits with reflectance arising as a functional response over the wavelength spectrum. To conduct this investigation, we employ a dataset from a global biodiversity hotspot, the Greater Cape Floristic Region in South Africa. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2202.09168 [pdf, other]

Preferential Sampling for Bivariate Spatial Data

Authors: Shinichiro Shirota, Alan E. Gelfand

Abstract: Preferential sampling provides a formal modeling specification to capture the effect of bias in a set of sampling locations on inference when a geostatistical model is used to explain observed responses at the sampled locations. In particular, it enables modification of spatial prediction adjusted for the bias. Its original presentation in the literature addressed assessment of the presence of suc… ▽ More Preferential sampling provides a formal modeling specification to capture the effect of bias in a set of sampling locations on inference when a geostatistical model is used to explain observed responses at the sampled locations. In particular, it enables modification of spatial prediction adjusted for the bias. Its original presentation in the literature addressed assessment of the presence of such sampling bias while follow on work focused on regression specification to improve spatial interpolation under such bias. All of the work in the literature to date considers the case of a univariate response variable at each location, either continuous or modeled through a latent continuous variable. The contribution here is to extend the notion of preferential sampling to the case of bivariate response at each location. This exposes sampling scenarios where both responses are observed at a given location as well as scenarios where, for some locations, only one of the responses is recorded. That is, there may be different sampling bias for one response than for the other. It leads to assessing the impact of such bias on co-kriging. It also exposes the possibility that preferential sampling can bias inference regarding dependence between responses at a location. We develop the idea of bivariate preferential sampling through various model specifications and illustrate the effect of these specifications on prediction and dependence behavior. We do this both through simulation examples as well as with a forestry dataset that provides mean diameter at breast height (MDBH) and trees per hectare (TPH) as the point-referenced bivariate responses. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2201.01687 [pdf, other]

doi 10.1007/s13253-022-00493-3

Spatial modeling of day-within-year temperature time series: an examination of daily maximum temperatures in Aragón, Spain

Authors: Jorge Castillo-Mateo, Miguel Lafuente, Jesús Asín, Ana C. Cebrián, Alan E. Gelfand, Jesús Abaurrea

Abstract: Acknowledging a considerable literature on modeling daily temperature data, we propose a multi-level spatio-temporal model which introduces several innovations in order to explain the daily maximum temperature in the summer period over 60 years in a region containing Aragón, Spain. The model operates over continuous space but adopts two discrete temporal scales, year and day within year. It captur… ▽ More Acknowledging a considerable literature on modeling daily temperature data, we propose a multi-level spatio-temporal model which introduces several innovations in order to explain the daily maximum temperature in the summer period over 60 years in a region containing Aragón, Spain. The model operates over continuous space but adopts two discrete temporal scales, year and day within year. It captures temporal dependence through autoregression on days within year and also on years. Spatial dependence is captured through spatial process modeling of intercepts, slope coefficients, variances, and autocorrelations. The model is expressed in a form which separates fixed effects from random effects and also separates space, years, and days for each type of effect. Motivated by exploratory data analysis, fixed effects to capture the influence of elevation, seasonality and a linear trend are employed. Pure errors are introduced for years, for locations within years, and for locations at days within years. The performance of the model is checked using a leave-one-out cross-validation. Applications of the model are presented including prediction of the daily temperature series at unobserved or partially observed sites and inference to investigate climate change comparison. △ Less

Submitted 14 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: 21 pages (+20 pages supplement), 7 figures (+6 figures supplement), 2 tables (+4 tables supplement); minor revision

Journal ref: Journal of Agricultural, Biological and Environmental Statistics 27(3), 487-505 (2022)

arXiv:2112.07249 [pdf, other]

Zero-inflated Beta distribution regression modeling

Authors: Becky Tang, Henry A Frye, Alan E. Gelfand, John A Silander Jr

Abstract: A frequent challenge encountered with ecological data is how to interpret, analyze, or model data having a high proportion of zeros. Much attention has been given to zero-inflated count data, whereas models for non-negative continuous data with an abundance of 0s are lacking. We consider zero-inflated data on the unit interval and provide modeling to capture two types of 0s in the context of the B… ▽ More A frequent challenge encountered with ecological data is how to interpret, analyze, or model data having a high proportion of zeros. Much attention has been given to zero-inflated count data, whereas models for non-negative continuous data with an abundance of 0s are lacking. We consider zero-inflated data on the unit interval and provide modeling to capture two types of 0s in the context of the Beta regression model. We model 0s due to missing by chance through left censoring of a latent regression, and 0s due to unsuitability using an independent Bernoulli specification to create a point mass at 0. We first develop the model as a spatial regression in environmental features and then extend to introduce spatial random effects. We specify models hierarchically, employing latent variables, fit them within a Bayesian framework, and present new model comparison tools. Our motivating dataset consists of percent cover abundance of two plant species at a collection of sites in the Cape Floristic Region of South Africa. We find that environmental features enable learning about the incidence of both types of 0s as well as the positive percent covers. We also show that the spatial random effects model improves predictive performance. The proposed modeling enables ecologists, using environmental regressors, to extract a better understanding of the presence/absence of species in terms of absence due to unsuitability vs. missingness by chance, as well as abundance when present. △ Less

Submitted 20 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2102.03249 [pdf, other]

Spatial Functional Data Modeling of Plant Reflectances

Authors: Philip A. White, Henry Frye, Michael F. Christensen, Alan E. Gelfand, John A. Silander Jr

Abstract: Plant reflectance spectra - the profile of light reflected by leaves across different wavelengths - supply the spectral signature for a species at a spatial location to enable estimation of functional and taxonomic diversity for plants. We consider leaf spectra as "responses" to be explained spatially. These spectra/reflectances are functions over a wavelength band that respond to the environment.… ▽ More Plant reflectance spectra - the profile of light reflected by leaves across different wavelengths - supply the spectral signature for a species at a spatial location to enable estimation of functional and taxonomic diversity for plants. We consider leaf spectra as "responses" to be explained spatially. These spectra/reflectances are functions over a wavelength band that respond to the environment. Our motivating data are gathered for several families from the Cape Floristic Region (CFR) in South Africa and lead us to develop rich novel spatial models that can explain spectra for genera within families. Wavelength responses for an individual leaf are viewed as a function of wavelength, leading to functional data modeling. Local environmental features become covariates. We introduce wavelength - covariate interaction since the response to environmental regressors may vary with wavelength, so may variance. Formal spatial modeling enables prediction of reflectances for genera at unobserved locations with known environmental features. We incorporate spatial dependence, wavelength dependence, and space-wavelength interaction (in the spirit of space-time interaction). We implement out-of-sample validation to select a best model, discovering that the model features listed above are all informative for the functional data analysis. We then supply interpretation of the results under the selected model. △ Less

Submitted 25 March, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

Comments: 20 pages main manuscript, 20 pages supplement

arXiv:2003.10490 [pdf, other]

Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation

Authors: Ninna Vihrs, Jesper Møller, Alan E. Gelfand

Abstract: In this paper, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realisations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry o… ▽ More In this paper, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realisations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry out statistical inference for this model. We suggest a method for model validation based on posterior predictions and global envelopes. We illustrate the ABC procedure and model validation approach using both simulated point patterns and a real data example. △ Less

Submitted 3 December, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

Comments: 37 pages, 10 figures; one line was added

Journal ref: Scandinavian Journal of Statistics, 49 (2022), 185-210

arXiv:2003.01168 [pdf, other]

Long-term Spatial Modeling for Characteristics of Extreme Heat Events

Authors: Erin M. Schliep, Alan E. Gelfand, Jesus Abaurrea, Jesus Asin, Maria A. Beamonte, Ana C. Cebrian

Abstract: There is increasing evidence that global warming manifests itself in more frequent warm days and that heat waves will become more frequent. Presently, a formal definition of a heat wave is not agreed upon in the literature. To avoid this debate, we consider extreme heat events, which, at a given location, are well-defined as a run of consecutive days above an associated local threshold. Characteri… ▽ More There is increasing evidence that global warming manifests itself in more frequent warm days and that heat waves will become more frequent. Presently, a formal definition of a heat wave is not agreed upon in the literature. To avoid this debate, we consider extreme heat events, which, at a given location, are well-defined as a run of consecutive days above an associated local threshold. Characteristics of EHEs are of primary interest, such as incidence and duration, as well as the magnitude of the average exceedance and maximum exceedance above the threshold during the EHE. Using approximately 60-year time series of daily maximum temperature data collected at 18 locations in a given region, we propose a spatio-temporal model to study the characteristics of EHEs over time. The model enables prediction of the behavior of EHE characteristics at unobserved locations within the region. Specifically, our approach employs a two-state space-time model for EHEs with local thresholds where one state defines above threshold daily maximum temperatures and the other below threshold temperatures. We show that our model is able to recover the EHE characteristics of interest and outperforms a corresponding autoregressive model that ignores thresholds based on out-of-sample prediction. △ Less

Submitted 29 June, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:1910.06897 [pdf, other]

doi 10.1007/s11009-020-09797-8

Generalized Evolutionary Point Processes: Model Specifications and Model Comparison

Authors: Philip A. White, Alan E. Gelfand

Abstract: Generalized evolutionary point processes offer a class of point process models that allows for either excitation or inhibition based upon the history of the process. In this regard, we propose modeling which comprises generalization of the nonlinear Hawkes process. Working within a Bayesian framework, model fitting is implemented through Markov chain Monte Carlo. This entails discussion of computa… ▽ More Generalized evolutionary point processes offer a class of point process models that allows for either excitation or inhibition based upon the history of the process. In this regard, we propose modeling which comprises generalization of the nonlinear Hawkes process. Working within a Bayesian framework, model fitting is implemented through Markov chain Monte Carlo. This entails discussion of computation of the likelihood for such point patterns. Furthermore, for this class of models, we discuss strategies for model comparison. Using simulation, we illustrate how well we can distinguish these models from point pattern specifications with conditionally independent event times, e.g., Poisson processes. Specifically, we demonstrate that these models can correctly identify true relationships (i.e., excitation or inhibition/control). Then, we consider a novel extension of the log Gaussian Cox process that incorporates evolutionary behavior and illustrate that our model comparison approach prefers the evolutionary log Gaussian Cox process compared to simpler models. We also examine a real dataset consisting of violent crime events from the 11th police district in Chicago from the year 2018. This data exhibits strong daily seasonality and changes across the year. After we account for these data attributes, we find significant but mild self-excitation, implying that event occurrence increases the intensity of future events. △ Less

Submitted 15 October, 2019; originally announced October 2019.

Journal ref: Methodology and Computing in Applied Probability (2020+)

arXiv:1908.09410 [pdf, other]

Clarifying species dependence under joint species distribution modeling

Authors: Alan E. Gelfand, Shinichiro Shirota

Abstract: Joint species distribution modeling is attracting increasing attention these days, acknowledging the fact that individual level modeling fails to take into account expected dependence/interaction between species. These models attempt to capture species dependence through an associated correlation matrix arising from a set of latent multivariate normal variables. However, these associations offer l… ▽ More Joint species distribution modeling is attracting increasing attention these days, acknowledging the fact that individual level modeling fails to take into account expected dependence/interaction between species. These models attempt to capture species dependence through an associated correlation matrix arising from a set of latent multivariate normal variables. However, these associations offer little insight into dependence behavior between species at sites. We focus on presence/absence data using joint species modeling which incorporates spatial dependence between sites. For pairs of species, we emphasize the induced odds ratios (along with the joint probabilities of occurrence); they provide much clearer understanding of joint presence/absence behavior. In fact, we propose a spatial odds ratio surface over the region of interest to capture how dependence varies over the region. We illustrate with a dataset from the Cape Floristic Region of South Africa consisting of more than 600 species at more than 600 sites. We present the spatial distribution of odds ratios for pairs of species that are positively correlated and pairs that are negatively correlated under the joint species distribution model. The multivariate normal covariance matrix associated with a collection of species is only a device for creating dependence among species but it lacks interpretation. By considering odds ratios, the quantitative ecologist will be able to better appreciate the practical dependence between species that is implicit in these joint species distribution modeling specifications. △ Less

Submitted 25 August, 2019; originally announced August 2019.

arXiv:1904.11518 [pdf, other]

doi 10.1007/s11749-020-00733-z

Multivariate Functional Data Modeling with Time-varying Clustering

Authors: Philip A. White, Alan E. Gelfand

Abstract: We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM$_{10}$ levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM$_{10}$ levels. We use the data… ▽ More We consider the situation where multivariate functional data has been collected over time at each of a set of sites. Our illustrative setting is bivariate, monitoring ozone and PM$_{10}$ levels as a function of time over the course of a year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City which record hourly ozone and PM$_{10}$ levels. We use the data for the year 2017. Hence, we have 48 functions to work with. Our objective is to implement model-based clustering of the functions across the sites. Using our example, such clustering can be considered for ozone and PM$_{10}$ individually or jointly. It may occur differentially for the two pollutants. More importantly for us, we allow that such clustering can vary with time. We model the multivariate functions across sites using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a stochastic process specification for the distribution of the collection of multivariate functions over the say $n$ sites. Furthermore, to cluster the functions, either individually by component or jointly with all components, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise in continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ a partitioning of the time scale to capture time-varying clustering. △ Less

Submitted 1 May, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

Journal ref: TEST (2020+)

arXiv:1809.01322 [pdf, other]

Preferential sampling for presence/absence data and for fusion of presence/absence data with presence-only data

Authors: Alan. E. Gelfand, Shinichiro Shirota

Abstract: Presence/absence data and presence-only data are the two customary sources for learning about species distributions over a region. We illuminate the fundamental modeling differences between the two types of data. Most simply, locations are considered as fixed under presence/absence data; locations are random under presence-only data. The definition of "probability of presence" is incompatible betw… ▽ More Presence/absence data and presence-only data are the two customary sources for learning about species distributions over a region. We illuminate the fundamental modeling differences between the two types of data. Most simply, locations are considered as fixed under presence/absence data; locations are random under presence-only data. The definition of "probability of presence" is incompatible between the two. So, we take issue with modeling strategies in the literature which ignore this incompatibility, which assume that presence/absence modeling can be induced from presence-only specifications and therefore, that fusion of presence-only and presence/absence data sources is routine. We argue that presence/absence data should be modeled at point level. That is, we need to specify a surface which provides the probability of presence at any location in the region. A realization from this surface is a binary map yielding the results of Bernoulli trials across all locations. Presence-only data should be modeled as a point pattern driven by specification of an intensity function. We further argue that, with just presence/absence data, preferential sampling, using a shared process perspective, can improve our estimated presence/absence surface and prediction of presence. We also argue that preferential sampling can enable a probabilistically coherent fusion of the two data types. We illustrate with two real datasets, one presence/absence, one presence-only for invasive species presence in New England in the United States. We demonstrate that potential bias in sampling locations can affect inference with regard to presence/absence and show that inference can be improved with preferential sampling ideas. We also provide a probabilistically coherent fusion of the two datasets to again improve inference with regard to presence/absence. △ Less

Submitted 2 April, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: Ecological Monographs, in press

arXiv:1807.03935 [pdf, other]

doi 10.1111/rssa.12444

Pollution State Modeling for Mexico City

Authors: Philip A. White, Alan E. Gelfand, Eliane R. Rodrigues, Guadalupe Tzintzun

Abstract: Ground-level ozone and particulate matter pollutants are associated with a variety of health issues and increased mortality. For this reason, Mexican environmental agencies regulate pollutant levels. In addition, Mexico City defines pollution emergencies using thresholds that rely on regional maxima for ozone and particulate matter with diameter less than 10 micrometers ($\text{PM}_{10}$). To pred… ▽ More Ground-level ozone and particulate matter pollutants are associated with a variety of health issues and increased mortality. For this reason, Mexican environmental agencies regulate pollutant levels. In addition, Mexico City defines pollution emergencies using thresholds that rely on regional maxima for ozone and particulate matter with diameter less than 10 micrometers ($\text{PM}_{10}$). To predict local pollution emergencies and to assess compliance to Mexican ambient air quality standards, we analyze hourly ozone and $\text{PM}_{10}$ measurements from 24 stations across Mexico City from 2017 using a bivariate spatiotemporal model. Using this model, we predict future pollutant levels using current weather conditions and recent pollutant concentrations. Using hourly pollutant projections, we predict regional maxima needed to estimate the probability of future pollution emergencies. We discuss how predicted compliance to legislated pollution limits varies across regions within Mexico City in 2017. We find that predicted probability of pollution emergencies is limited to a few time periods. In contrast, we show that predicted exceedance of Mexican ambient air quality standards is a common, nearly daily occurrence. △ Less

Submitted 10 July, 2018; originally announced July 2018.

Journal ref: J. R. Stat. Soc. A., 182(3), 1039-1060 (2019)

arXiv:1711.05646 [pdf, other]

Spatial Joint Species Distribution Modeling using Dirichlet Processes

Authors: Shinichiro Shirota, Alan E. Gelfand, Sudipto Banerjee

Abstract: Species distribution models usually attempt to explain presence-absence or abundance of a species at a site in terms of the environmental features (socalled abiotic features) present at the site. Historically, such models have considered species individually. However, it is well-established that species interact to influence presence-absence and abundance (envisioned as biotic factors). As a resul… ▽ More Species distribution models usually attempt to explain presence-absence or abundance of a species at a site in terms of the environmental features (socalled abiotic features) present at the site. Historically, such models have considered species individually. However, it is well-established that species interact to influence presence-absence and abundance (envisioned as biotic factors). As a result, there has been substantial recent interest in joint species distribution models with various types of response, e.g., presence-absence, continuous and ordinal data. Such models incorporate dependence between species response as a surrogate for interaction. The challenge we focus on here is how to address such modeling in the context of a large number of species (e.g., order 102) across sites numbering in the order of 102 or 103 when, in practice, only a few species are found at any observed site. Again, there is some recent literature to address this; we adopt a dimension reduction approach. The novel wrinkle we add here is spatial dependence. That is, we have a collection of sites over a relatively small spatial region so it is anticipated that species distribution at a given site would be similar to that at a nearby site. Specifically, we handle dimension reduction through Dirichlet processes joined with spatial dependence through Gaussian processes. We use both simulated data and a plant communities dataset for the Cape Floristic Region (CFR) of South Africa to demonstrate our approach. The latter consists of presence-absence measurements for 639 tree species on 662 locations. Through both data examples we are able to demonstrate improved predictive performance using the foregoing specification. △ Less

Submitted 21 September, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

arXiv:1704.05032 [pdf, other]

doi 10.1007/s00477-015-1163-9

The wrapped skew Gaussian process for analyzing spatio-temporal data

Authors: Gianluca Mastrantonio, Giovanna Jona Lasinio, Alan E. Gelfand

Abstract: We consider modeling of angular or directional data viewed as a linear variable wrapped onto a unit circle. In particular, we focus on the spatio-temporal context, motivated by a collection of wave directions obtained as computer model output developed dynamically over a collection of spatial locations. We propose a novel wrapped skew Gaussian process which enriches the class of wrapped Gaussian p… ▽ More We consider modeling of angular or directional data viewed as a linear variable wrapped onto a unit circle. In particular, we focus on the spatio-temporal context, motivated by a collection of wave directions obtained as computer model output developed dynamically over a collection of spatial locations. We propose a novel wrapped skew Gaussian process which enriches the class of wrapped Gaussian process. The wrapped skew Gaussian process enables more flexible marginal distributions than the symmetric ones arising under the wrapped Gaussian process and it allows straightforward interpretation of parameters. We clarify that replication through time enables criticism of the wrapped process in favor of the wrapped skew process. We formulate a hierarchical model incorporating this process and show how to introduce appropriate latent variables in order to enable efficient fitting to dynamic spatial directional data. We also show how to implement kriging and forecasting under this model. We provide a simulation example as a proof of concept as well as a real data example. Both examples reveal consequential improvement in predictive performance for the wrapped skew Gaussian specification compared with the earlier wrapped Gaussian version. △ Less

Submitted 17 April, 2017; originally announced April 2017.

arXiv:1704.05029 [pdf, other]

doi 10.1007/s11749-015-0458-y

Spatio-temporal circular models with non-separable covariance structure

Authors: Gianluca Mastrantonio, Giovanna Jona Lasinio, Alan E. Gelfand

Abstract: Circular data arise in many areas of application. Recently, there has been interest in looking at circular data collected separately over time and over space. Here, we extend some of this work to the spatio-temporal setting, introducing space-time dependence. We accommodate covariates, implement full kriging and forecasting, and also allow for a nugget which can be time dependent. We work within a… ▽ More Circular data arise in many areas of application. Recently, there has been interest in looking at circular data collected separately over time and over space. Here, we extend some of this work to the spatio-temporal setting, introducing space-time dependence. We accommodate covariates, implement full kriging and forecasting, and also allow for a nugget which can be time dependent. We work within a Bayesian framework, introducing suitable latent variables to facilitate Markov chain Monte Carlo (MCMC) model fitting. The Bayesian framework enables us to implement full inference, obtaining predictive distributions for kriging and forecasting. We offer comparison between the less flexible but more interpretable wrapped Gaussian process and the more flexible but less interpretable projected Gaussian process. We do this illustratively using both simulated data and data from computer model output for wave directions in the Adriatic Sea off the coast of Italy. △ Less

Submitted 17 April, 2017; originally announced April 2017.

arXiv:1701.05863 [pdf, other]

Analyzing Car Thefts and Recoveries with Connections to Modeling Origin-Destination Point Patterns

Authors: Shinichiro Shirota, Alan E Gelfand, Jorge Mateu

Abstract: For a given region, we have a dataset composed of car theft locations along with a linked dataset of recovery locations which, due to partial recovery, is a relatively small subset of the set of theft locations. For an investigator seeking to understand the behavior of car thefts and recoveries in the region, several questions are addressed. Viewing the set of theft locations as a point pattern, c… ▽ More For a given region, we have a dataset composed of car theft locations along with a linked dataset of recovery locations which, due to partial recovery, is a relatively small subset of the set of theft locations. For an investigator seeking to understand the behavior of car thefts and recoveries in the region, several questions are addressed. Viewing the set of theft locations as a point pattern, can we propose useful models to explain the pattern? What types of predictive models can be built to learn about recovery location given theft location? Can the dependence between theft locations and recovery locations be formalized? Can the flow between theft sites and recovery sites be captured? Origin-destination modeling offers a natural framework for such problems. However, here the data is not for areal units but rather is a pair of point patterns, with the recovery point pattern only partially observed. We offer modeling approaches for investigating the questions above and apply the approaches to two datasets. One is small from the state of Neza in Mexico with areal covariate information regarding population features and crime type. A second, much larger one, is from Belo Horizonte in Brazil but lacks covariates. △ Less

Submitted 3 April, 2020; v1 submitted 20 January, 2017; originally announced January 2017.

arXiv:1611.10359 [pdf, other]

Inference for log Gaussian Cox processes using an approximate marginal posterior

Authors: Shinichiro Shirota, Alan E. Gelfand

Abstract: The log Gaussian Cox process is a flexible class of point pattern models for capturing spatial and spatio-temporal dependence for point patterns. Model fitting requires approximation of stochastic integrals which is implemented through discretization of the domain of interest. With fine scale discretization, inference based on Markov chain Monte Carlo is computationally heavy because of the cost o… ▽ More The log Gaussian Cox process is a flexible class of point pattern models for capturing spatial and spatio-temporal dependence for point patterns. Model fitting requires approximation of stochastic integrals which is implemented through discretization of the domain of interest. With fine scale discretization, inference based on Markov chain Monte Carlo is computationally heavy because of the cost of repeated iteration or inversion or Cholesky decomposition (cubic order) of high dimensional covariance matrices associated with latent Gaussian variables. Furthermore, hyperparameters for latent Gaussian variables have strong dependence with sampled latent Gaussian variables. Altogether, standard Markov chain Monte Carlo strategies are inefficient and not well behaved. In this paper, we propose an efficient computational strategy for fitting and inferring with spatial log Gaussian Cox processes. The proposed algorithm is based on a pseudo-marginal Markov chain Monte Carlo approach. We estimate an approximate marginal posterior for parameters of log Gaussian Cox processes and propose comprehensive model inference strategy. We provide details for all of the above along with some simulation investigation for the univariate and multivariate settings. As an example, we present an analysis of a point pattern of locations of three tree species, exhibiting positive and negative interaction between different species. △ Less

Submitted 30 November, 2016; originally announced November 2016.

Comments: previous version is "Approximate Marginal Posterior for Log Gaussian Cox Processes". arXiv admin note: text overlap with arXiv:1606.07984

arXiv:1611.08719 [pdf, other]

Space and circular time log Gaussian Cox processes with application to crime event data

Authors: Shinichiro Shirota, Alan E. Gelfand

Abstract: We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a \emph{random} intensity which we model as a realization of a spatio-temporal log Gaussian process. Importantly, we view time as circular not linear, ne… ▽ More We view the locations and times of a collection of crime events as a space-time point pattern. So, with either a nonhomogeneous Poisson process or with a more general Cox process, we need to specify a space-time intensity. For the latter, we need a \emph{random} intensity which we model as a realization of a spatio-temporal log Gaussian process. Importantly, we view time as circular not linear, necessitating valid separable and nonseparable covariance functions over a bounded spatial region crossed with circular time. In addition, crimes are classified by crime type. Furthermore, each crime event is recorded by day of the year which we convert to day of the week marks. The contribution here is to develop models to accommodate such data. Our specifications take the form of hierarchical models which we fit within a Bayesian framework. In this regard, we consider model comparison between the nonhomogeneous Poisson process and the log Gaussian Cox process. We also compare separable vs. nonseparable covariance specifications. Our motivating dataset is a collection of crime events for the city of San Francisco during the year 2012. We have location, hour, day of the year, and crime type for each event. We investigate models to enhance our understanding of the set of incidences. △ Less

Submitted 26 November, 2016; originally announced November 2016.

Comments: accepted "Annals of Applied Statistics"

arXiv:1607.07002 [pdf, other]

Disease Mapping with Generative Models

Authors: Feifei Wang, Jian Wang, Alan E. Gelfand, Fan Li

Abstract: Disease mapping focuses on learning about areal units presenting high relative risk. Disease mapping models for disease counts specify Poisson regressions in relative risks compared with the expected counts. These models typically incorporate spatial random effects to accomplish spatial smoothing. Fitting of these models customarily computes expected disease counts via internal standardization. Th… ▽ More Disease mapping focuses on learning about areal units presenting high relative risk. Disease mapping models for disease counts specify Poisson regressions in relative risks compared with the expected counts. These models typically incorporate spatial random effects to accomplish spatial smoothing. Fitting of these models customarily computes expected disease counts via internal standardization. This places the data on both sides of the model, i.e., the counts are on the left side but they are also used to obtain the expected counts on the right side. As a result, these internally standardized models are incoherent and not generative; probabilistically, they could not produce the observed data. Here, we argue for adopting the direct generative model for disease counts. We model disease incidence instead of relative risks, using a generalized logistic regression. We extract relative risks post model fitting. We also extend the generative model to dynamic settings. We compare the generative models with internally standardized models through simulated datasets and a well-examined lung cancer morbidity data in Ohio. Each model is a spatial smoother and they smooth the data similarly with regard to relative risks. However, the generative models tend to provide tighter credible intervals. Since the generative specification is no more difficult to fit, is coherent, and is at least as good inferentially, we suggest it should be the model of choice for spatial disease mapping. △ Less

Submitted 24 July, 2016; originally announced July 2016.

arXiv:1606.07984

Approximate Marginal Posterior for Log Gaussian Cox Processes

Authors: Shinichiro Shirota, Alan. E. Gelfand

Abstract: The log Gaussian Cox process is a flexible class of Cox processes, whose intensity surface is stochastic, for incorporating complex spatial and time structure of point patterns. The straightforward inference based on Markov chain Monte Carlo is computationally heavy because the computational cost of inverse or Cholesky decomposition of high dimensional covariance matrices of Gaussian latent variab… ▽ More The log Gaussian Cox process is a flexible class of Cox processes, whose intensity surface is stochastic, for incorporating complex spatial and time structure of point patterns. The straightforward inference based on Markov chain Monte Carlo is computationally heavy because the computational cost of inverse or Cholesky decomposition of high dimensional covariance matrices of Gaussian latent variables is cubic order of their dimension. Furthermore, since hyperparameters for Gaussian latent variables have high correlations with sampled Gaussian latent processes themselves, standard Markov chain Monte Carlo strategies are inefficient. In this paper, we propose an efficient and scalable computational strategy for spatial log Gaussian Cox processes. The proposed algorithm is based on pseudo-marginal Markov chain Monte Carlo approach. Based on this approach, we propose estimation of approximate marginal posterior for parameters and comprehensive model validation strategies. We provide details for all of the above along with some simulation investigation for univariate and multivariate settings and analysis of a point pattern of tree data exhibiting positive and negative interaction between different species. △ Less

Submitted 30 November, 2016; v1 submitted 25 June, 2016; originally announced June 2016.

Comments: The title of updated version is "Inference for log Gaussian Cox processes using an approximate marginal posterior" [arXiv:1611.10359]

arXiv:1604.07027 [pdf, other]

Approximate Bayesian Computation and Model Validation for Repulsive Spatial Point Processes

Authors: Shinichiro Shirota, Alan. E. Gelfand

Abstract: In many applications involving spatial point patterns, we find evidence of inhibition or repulsion. The most commonly used class of models for such settings are the Gibbs point processes. A recent alternative, at least to the statistical community, is the determinantal point process. Here, we examine model fitting and inference for both of these classes of processes in a Bayesian framework. While… ▽ More In many applications involving spatial point patterns, we find evidence of inhibition or repulsion. The most commonly used class of models for such settings are the Gibbs point processes. A recent alternative, at least to the statistical community, is the determinantal point process. Here, we examine model fitting and inference for both of these classes of processes in a Bayesian framework. While usual MCMC model fitting can be available, the algorithms are complex and are not always well behaved. We propose using approximate Bayesian computation (ABC) for such fitting. This approach becomes attractive because, though likelihoods are very challenging to work with for these processes, generation of realizations given parameter values is relatively straightforward. As a result, the ABC fitting approach is well-suited for these models. In addition, such simulation makes them well-suited for posterior predictive inference as well as for model assessment. We provide details for all of the above along with some simulation investigation and an illustrative analysis of a point pattern of tree data exhibiting repulsion. R-code and datasets are included in the supplementary material. △ Less

Submitted 26 August, 2016; v1 submitted 24 April, 2016; originally announced April 2016.

arXiv:1503.08357 [pdf, ps, other]

Spatial Process Gradients and Their Use in Sensitivity Analysis for Environmental Processes

Authors: Maria A. Terres, Alan E. Gelfand

Abstract: This paper develops methodology for local sensitivity analysis based on directional derivatives associated with spatial processes. Formal gradient analysis for spatial processes was elaborated in previous papers, focusing on distribution theory for directional derivatives associated with a response variable assumed to follow a Gaussian process model. In the current work, these ideas are extended t… ▽ More This paper develops methodology for local sensitivity analysis based on directional derivatives associated with spatial processes. Formal gradient analysis for spatial processes was elaborated in previous papers, focusing on distribution theory for directional derivatives associated with a response variable assumed to follow a Gaussian process model. In the current work, these ideas are extended to additionally accommodate a continuous covariate whose directional derivatives are also of interest and to relate the behavior of the directional derivatives of the response surface to those of the covariate surface. It is of interest to assess whether, in some sense, the gradients of the response follow those of the explanatory variable. The joint Gaussian structure of all variables, including the directional derivatives, allows for explicit distribution theory and, hence, kriging across the spatial region using multivariate normal theory. Working within a Bayesian hierarchical modeling framework, posterior samples enable all gradient analysis to occur post model fitting. As a proof of concept, we show how our methodology can be applied to a standard geostatistical modeling setting using a simulation example. For a real data illustration, we work with point pattern data, deferring our gradient analysis to the intensity surface, adopting a log-Gaussian Cox process model. In particular, we relate elevation data to point patterns associated with several tree species in Duke Forest. △ Less

Submitted 28 March, 2015; originally announced March 2015.

arXiv:1406.7343 [pdf, other]

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

Authors: Abhirup Datta, Sudipto Banerjee, Andrew O. Finley, Alan E. Gelfand

Abstract: Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This manuscript develops a class of highly scalable Nearest Neighbor Gaussian Process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legit… ▽ More Spatial process models for analyzing geostatistical data entail computations that become prohibitive as the number of spatial locations become large. This manuscript develops a class of highly scalable Nearest Neighbor Gaussian Process (NNGP) models to provide fully model-based inference for large geostatistical datasets. We establish that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices. We embed the NNGP as a sparsity-inducing prior within a rich hierarchical modeling framework and outline how computationally efficient Markov chain Monte Carlo (MCMC) algorithms can be executed without storing or decomposing large matrices. The floating point operations (flops) per iteration of this algorithm is linear in the number of spatial locations, thereby rendering substantial scalability. We illustrate the computational and inferential benefits of the NNGP over competing methods using simulation studies and also analyze forest biomass from a massive United States Forest Inventory dataset at a scale that precludes alternative dimension-reducing methods. △ Less

Submitted 1 January, 2016; v1 submitted 27 June, 2014; originally announced June 2014.

arXiv:1312.7260 [pdf, ps, other]

doi 10.1214/13-STS444

Scaling Integral Projection Models for Analyzing Size Demography

Authors: Alan E. Gelfand, Souparno Ghosh, James S. Clark

Abstract: Historically, matrix projection models (MPMs) have been employed to study population dynamics with regard to size, age or structure. To work with continuous traits, in the past decade, integral projection models (IPMs) have been proposed. Following the path for MPMs, currently, IPMs are handled first with a fitting stage, then with a projection stage. Model fitting has, so far, been done only with… ▽ More Historically, matrix projection models (MPMs) have been employed to study population dynamics with regard to size, age or structure. To work with continuous traits, in the past decade, integral projection models (IPMs) have been proposed. Following the path for MPMs, currently, IPMs are handled first with a fitting stage, then with a projection stage. Model fitting has, so far, been done only with individual-level transition data. These data are used in the fitting stage to estimate the demographic functions (survival, growth, fecundity) that comprise the kernel of the IPM specification. The estimated kernel is then iterated from an initial trait distribution to obtain what is interpreted as steady state population behavior. Such projection results in inference that does not align with observed temporal distributions. This might be expected; a model for population level projection should be fitted with population level transitions. △ Less

Submitted 27 December, 2013; originally announced December 2013.

Comments: Published in at http://dx.doi.org/10.1214/13-STS444 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS444

Journal ref: Statistical Science 2013, Vol. 28, No. 4, 641-658

arXiv:1310.8192 [pdf, other]

spBayes for large univariate and multivariate point-referenced spatio-temporal data models

Authors: Andrew O. Finley, Sudipto Banerjee, Alan E. Gelfand

Abstract: In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These efforts have focused on improving computational efficiency, flexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-t… ▽ More In this paper we detail the reformulation and rewrite of core functions in the spBayes R package. These efforts have focused on improving computational efficiency, flexibility, and usability for point-referenced data models. Attention is given to algorithm and computing developments that result in improved sampler convergence rate and efficiency by reducing parameter space; decreased sampler run-time by avoiding expensive matrix computations, and; increased scalability to large datasets by implementing a class of predictive process models that attempt to overcome computational hurdles by representing spatial processes in terms of lower-dimensional realizations. Beyond these general computational improvements for existing model functions, we detail new functions for modeling data indexed in both space and time. These new functions implement a class of dynamic spatio-temporal models for settings where space is viewed as continuous and time is taken as discrete. △ Less

Submitted 30 October, 2013; originally announced October 2013.

arXiv:1306.1167 [pdf, ps, other]

A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles

Authors: Sungsoo Ahn, Michael Chertkov, Andrew E. Gelfand, Sejun Park, Jinwoo Shin

Abstract: We study the Maximum Weight Matching (MWM) problem for general graphs through the max-product Belief Propagation (BP) and related Linear Programming (LP). The BP approach provides distributed heuristics for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM) and respective LPs can be considered as continuous relaxations of the… ▽ More We study the Maximum Weight Matching (MWM) problem for general graphs through the max-product Belief Propagation (BP) and related Linear Programming (LP). The BP approach provides distributed heuristics for finding the Maximum A Posteriori (MAP) assignment in a joint probability distribution represented by a Graphical Model (GM) and respective LPs can be considered as continuous relaxations of the discrete MAP problem. It was recently shown that a BP algorithm converges to the correct MWM assignment under a simple GM formulation of MAP/MWM as long as the corresponding LP relaxation is tight. First, under the motivation for forcing the tightness condition, we consider a new GM formulation of MWM, say C-GM, using non-intersecting odd-sized cycles in the graph: the new corresponding LP relaxation, say C-LP, becomes tight for more MWM instances. However, the tightness of C-LP now does not guarantee such convergence and correctness of the new BP on C-GM. To address the issue, we introduce a novel graph transformation applied to C-GM, which results in another GM formulation of MWM, and prove that the respective BP on it converges to the correct MAP/MWM assignment as long as C-LP is tight. Finally, we also show that C-LP always has half-integral solutions, which leads to an efficient BP-based MWM heuristic consisting of making sequential, `cutting plane', modifications to the underlying GM. Our experiments show that this BP-based cutting plane heuristic performs as well as that based on traditional LP solvers. △ Less

Submitted 31 December, 2017; v1 submitted 5 June, 2013; originally announced June 2013.

arXiv:1210.4916 [pdf]

A Cluster-Cumulant Expansion at the Fixed Points of Belief Propagation

Authors: Max Welling, Andrew E. Gelfand, Alexander T. Ihler

Abstract: We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in [1]. However, in contrast to the latter, the CCE enjoys the following important qualities: 1) it is defined for arbitrary state spaces 2) it is easily extended to fixed points of generalized belief pr… ▽ More We introduce a new cluster-cumulant expansion (CCE) based on the fixed points of iterative belief propagation (IBP). This expansion is similar in spirit to the loop-series (LS) recently introduced in [1]. However, in contrast to the latter, the CCE enjoys the following important qualities: 1) it is defined for arbitrary state spaces 2) it is easily extended to fixed points of generalized belief propagation (GBP), 3) disconnected groups of variables will not contribute to the CCE and 4) the accuracy of the expansion empirically improves upon that of the LS. The CCE is based on the same Möbius transform as the Kikuchi approximation, but unlike GBP does not require storing the beliefs of the GBP-clusters nor does it suffer from convergence issues during belief updating. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Report number: UAI-P-2012-PG-883-892

arXiv:1210.4857 [pdf]

Generalized Belief Propagation on Tree Robust Structured Region Graphs

Authors: Andrew E. Gelfand, Max Welling

Abstract: This paper provides some new guidance in the construction of region graphs for Generalized Belief Propagation (GBP). We connect the problem of choosing the outer regions of a LoopStructured Region Graph (SRG) to that of finding a fundamental cycle basis of the corresponding Markov network. We also define a new class of tree-robust Loop-SRG for which GBP on any induced (spanning) tree of the Markov… ▽ More This paper provides some new guidance in the construction of region graphs for Generalized Belief Propagation (GBP). We connect the problem of choosing the outer regions of a LoopStructured Region Graph (SRG) to that of finding a fundamental cycle basis of the corresponding Markov network. We also define a new class of tree-robust Loop-SRG for which GBP on any induced (spanning) tree of the Markov network, obtained by setting to zero the off-tree interactions, is exact. This class of SRG is then mapped to an equivalent class of tree-robust cycle bases on the Markov network. We show that a treerobust cycle basis can be identified by proving that for every subset of cycles, the graph obtained from the edges that participate in a single cycle only, is multiply connected. Using this we identify two classes of tree-robust cycle bases: planar cycle bases and "star" cycle bases. In experiments we show that tree-robustness can be successfully exploited as a design principle to improve the accuracy and convergence of GBP. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Report number: UAI-P-2012-PG-296-305

arXiv:1203.3487 [pdf]

BEEM : Bucket Elimination with External Memory

Authors: Kalev Kask, Rina Dechter, Andrew E. Gelfand

Abstract: A major limitation of exact inference algorithms for probabilistic graphical models is their extensive memory usage, which often puts real-world problems out of their reach. In this paper we show how we can extend inference algorithms, particularly Bucket Elimination, a special case of cluster (join) tree decomposition, to utilize disk memory. We provide the underlying ideas and show promising emp… ▽ More A major limitation of exact inference algorithms for probabilistic graphical models is their extensive memory usage, which often puts real-world problems out of their reach. In this paper we show how we can extend inference algorithms, particularly Bucket Elimination, a special case of cluster (join) tree decomposition, to utilize disk memory. We provide the underlying ideas and show promising empirical results of exactly solving large problems not solvable before. △ Less

Submitted 15 March, 2012; originally announced March 2012.

Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

Report number: UAI-P-2010-PG-268-276

arXiv:1011.3327 [pdf, ps, other]

doi 10.1214/10-AOAS335

Modeling large scale species abundance with latent spatial processes

Authors: Avishek Chakraborty, Alan E. Gelfand, Adam M. Wilson, Andrew M. Latimer, John A. Silander Jr

Abstract: Modeling species abundance patterns using local environmental features is an important, current problem in ecology. The Cape Floristic Region (CFR) in South Africa is a global hot spot of diversity and endemism, and provides a rich class of species abundance data for such modeling. Here, we propose a multi-stage Bayesian hierarchical model for explaining species abundance over this region. Our mod… ▽ More Modeling species abundance patterns using local environmental features is an important, current problem in ecology. The Cape Floristic Region (CFR) in South Africa is a global hot spot of diversity and endemism, and provides a rich class of species abundance data for such modeling. Here, we propose a multi-stage Bayesian hierarchical model for explaining species abundance over this region. Our model is specified at areal level, where the CFR is divided into roughly $37{,}000$ one minute grid cells; species abundance is observed at some locations within some cells. The abundance values are ordinally categorized. Environmental and soil-type factors, likely to influence the abundance pattern, are included in the model. We formulate the empirical abundance pattern as a degraded version of the potential pattern, with the degradation effect accomplished in two stages. First, we adjust for land use transformation and then we adjust for measurement error, hence misclassification error, to yield the observed abundance classifications. An important point in this analysis is that only $28%$ of the grid cells have been sampled and that, for sampled grid cells, the number of sampled locations ranges from one to more than one hundred. Still, we are able to develop potential and transformed abundance surfaces over the entire region. In the hierarchical framework, categorical abundance classifications are induced by continuous latent surfaces. The degradation model above is built on the latent scale. On this scale, an areal level spatial regression model was used for modeling the dependence of species abundance on the environmental factors. △ Less

Submitted 23 November, 2010; v1 submitted 15 November, 2010; originally announced November 2010.

Comments: Published in at http://dx.doi.org/10.1214/10-AOAS335 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS335

Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 3, 1403-1429

arXiv:1004.1147 [pdf, other]

A bivariate space-time downscaler under space and time misalignment

Authors: Veronica J. Berrocal, Alan E. Gelfand, David M. Holland

Abstract: Ozone and particulate matter PM2.5 are co-pollutants that have long been associated with increased public health risks. Information on concentration levels for both pollutants come from two sources: monitoring sites and output from complex numerical models that produce concentration surfaces over large spatial regions. In this paper, we offer a fully-model based approach for fusing these two sour… ▽ More Ozone and particulate matter PM2.5 are co-pollutants that have long been associated with increased public health risks. Information on concentration levels for both pollutants come from two sources: monitoring sites and output from complex numerical models that produce concentration surfaces over large spatial regions. In this paper, we offer a fully-model based approach for fusing these two sources of information for the pair of co-pollutants which is computationally feasible over large spatial regions and long periods of time. Due to the association between concentration levels of the two environmental contaminants, it is expected that information regarding one will help to improve prediction of the other. Misalignment is an obvious issue since the monitoring networks for the two contaminants only partly intersect and because the collection rate for PM2.5 is typically less frequent than that for ozone. Extending previous work in Berrocal et al. (2010), we introduce a bivariate downscaler that provides a flexible class of bivariate space-time assimilation models. We discuss computational issues for model fitting and analyze a dataset for ozone and PM2.5 for the ozone season during year 2002. We show a modest improvement in predictive performance, not surprising in a setting where we can anticipate only a small gain. △ Less

Submitted 7 April, 2010; originally announced April 2010.

Comments: 26 pages; 5 tables; 8 figures

arXiv:0910.1432 [pdf, ps, other]

doi 10.1214/09-AOAS240

Analysis of Minnesota colon and rectum cancer point patterns with spatial and nonspatial covariate information

Authors: Shengde Liang, Bradley P. Carlin, Alan E. Gelfand

Abstract: Colon and rectum cancer share many risk factors, and are often tabulated together as ``colorectal cancer'' in published summaries. However, recent work indicating that exercise, diet, and family history may have differential impacts on the two cancers encourages analyzing them separately, so that corresponding public health interventions can be more efficiently targeted. We analyze colon and rec… ▽ More Colon and rectum cancer share many risk factors, and are often tabulated together as ``colorectal cancer'' in published summaries. However, recent work indicating that exercise, diet, and family history may have differential impacts on the two cancers encourages analyzing them separately, so that corresponding public health interventions can be more efficiently targeted. We analyze colon and rectum cancer data from the Minnesota Cancer Surveillance System from 1998--2002 over the 16-county Twin Cities (Minneapolis--St. Paul) metro and exurban area. The data consist of two marked point patterns, meaning that any statistical model must account for randomness in the observed locations, and expected positive association between the two cancer patterns. Our model extends marked spatial point pattern analysis in the context of a log Gaussian Cox process to accommodate spatially referenced covariates (local poverty rate and location within the metro area), individual-level risk factors (patient age and cancer stage), and related interactions. We obtain smoothed maps of marginal log-relative intensity surfaces for colon and rectum cancer, and uncover significant age and stage differences between the two groups. This encourages more aggressive colon cancer screening in the inner Twin Cities and their southern and western exurbs, where our model indicates higher colon cancer relative intensity. △ Less

Submitted 8 October, 2009; originally announced October 2009.

Comments: Published in at http://dx.doi.org/10.1214/09-AOAS240 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS240

Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 3, 943-962

arXiv:0901.3494 [pdf, ps, other]

doi 10.1214/08-AOAS174

Interpreting self-organizing maps through space--time data models

Authors: Huiyan Sang, Alan E. Gelfand, Chris Lennard, Gabriele Hegerl, Bruce Hewitson

Abstract: Self-organizing maps (SOMs) are a technique that has been used with high-dimensional data vectors to develop an archetypal set of states (nodes) that span, in some sense, the high-dimensional space. Noteworthy applications include weather states as described by weather variables over a region and speech patterns as characterized by frequencies in time. The SOM approach is essentially a neural ne… ▽ More Self-organizing maps (SOMs) are a technique that has been used with high-dimensional data vectors to develop an archetypal set of states (nodes) that span, in some sense, the high-dimensional space. Noteworthy applications include weather states as described by weather variables over a region and speech patterns as characterized by frequencies in time. The SOM approach is essentially a neural network model that implements a nonlinear projection from a high-dimensional input space to a low-dimensional array of neurons. In the process, it also becomes a clustering technique, assigning to any vector in the high-dimensional data space the node (neuron) to which it is closest (using, say, Euclidean distance) in the data space. The number of nodes is thus equal to the number of clusters. However, the primary use for the SOM is as a representation technique, that is, finding a set of nodes which representatively span the high-dimensional space. These nodes are typically displayed using maps to enable visualization of the continuum of the data space. The technique does not appear to have been discussed in the statistics literature so it is our intent here to bring it to the attention of the community. The technique is implemented algorithmically through a training set of vectors. However, through the introduction of stochasticity in the form of a space--time process model, we seek to illuminate and interpret its performance in the context of application to daily data collection. That is, the observed daily state vectors are viewed as a time series of multivariate process realizations which we try to understand under the dimension reduction achieved by the SOM procedure. △ Less

Submitted 22 January, 2009; originally announced January 2009.

Comments: Published in at http://dx.doi.org/10.1214/08-AOAS174 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS174

Journal ref: Annals of Applied Statistics 2008, Vol. 2, No. 4, 1194-1216

Showing 1–42 of 42 results for author: Gelfand, A E