-
Joint modeling of low and high extremes using a multivariate extended generalized Pareto distribution
Authors:
Noura Alotaibi,
Matthew Sainsbury-Dale,
Philippe Naveau,
Carlo Gaetan,
Raphaël Huser
Abstract:
In most risk assessment studies, it is important to accurately capture the entire distribution of the multivariate random vector of interest from low to high values. For example, in climate sciences, low precipitation events may lead to droughts, while heavy rainfall may generate large floods, and both of these extreme scenarios can have major impacts on the safety of people and infrastructure, as…
▽ More
In most risk assessment studies, it is important to accurately capture the entire distribution of the multivariate random vector of interest from low to high values. For example, in climate sciences, low precipitation events may lead to droughts, while heavy rainfall may generate large floods, and both of these extreme scenarios can have major impacts on the safety of people and infrastructure, as well as agricultural or other economic sectors. In the univariate case, the extended generalized Pareto distribution (eGPD) was specifically developed to accurately model low, moderate, and high precipitation intensities, while bypassing the threshold selection procedure usually conducted in extreme-value analyses. In this work, we extend this approach to the multivariate case. The proposed multivariate eGPD has the following appealing properties: (1) its marginal distributions behave like univariate eGPDs; (2) its lower and upper joint tails comply with multivariate extreme-value theory, with key parameters separately controlling dependence in each joint tail; and (3) the model allows for fast simulation and is thus amenable to simulation-based inference. We propose estimating model parameters by leveraging modern neural approaches, where a neural network, once trained, can provide point estimates, credible intervals, or full posterior approximations in a fraction of a second. Our new methodology is illustrated by application to daily rainfall times series data from the Netherlands. The proposed model is shown to provide satisfactory marginal and dependence fits from low to high quantiles.
△ Less
Submitted 7 September, 2025;
originally announced September 2025.
-
Flexible space-time models for extreme data
Authors:
Lorenzo Dell'Oro,
Carlo Gaetan
Abstract:
Extreme value analysis is an essential methodology in the study of rare and extreme events, which hold significant interest in various fields, particularly in the context of environmental sciences. Models that employ the exceedances of values above suitably selected high thresholds possess the advantage of capturing the "sub-asymptotic" dependence of data. This paper presents an extension of spati…
▽ More
Extreme value analysis is an essential methodology in the study of rare and extreme events, which hold significant interest in various fields, particularly in the context of environmental sciences. Models that employ the exceedances of values above suitably selected high thresholds possess the advantage of capturing the "sub-asymptotic" dependence of data. This paper presents an extension of spatial random scale mixture models to the spatio-temporal domain. A comprehensive framework for characterizing the dependence structure of extreme events across both dimensions is provided. Indeed, the model is capable of distinguishing between asymptotic dependence and independence, both in space and time, through the use of parametric inference. The high complexity of the likelihood function for the proposed model necessitates a simulation approach based on neural networks for parameter estimation, which leverages summaries of the sub-asymptotic dependence present in the data. The effectiveness of the model in assessing the limiting dependence structure of spatio-temporal processes is demonstrated through both simulation studies and an application to rainfall datasets.
△ Less
Submitted 2 July, 2025; v1 submitted 28 November, 2024;
originally announced November 2024.
-
Spatial quantile clustering of climate data
Authors:
Carlo Gaetan,
Paolo Girardi,
Victor Muthama Musau
Abstract:
In the era of climate change, the distribution of climate variables evolves with changes not limited to the mean value. Consequently, clustering algorithms based on central tendency could produce misleading results when used to summarize spatial and/or temporal patterns. We present a novel approach to spatial clustering of time series based on quantiles using a Bayesian framework that incorporates…
▽ More
In the era of climate change, the distribution of climate variables evolves with changes not limited to the mean value. Consequently, clustering algorithms based on central tendency could produce misleading results when used to summarize spatial and/or temporal patterns. We present a novel approach to spatial clustering of time series based on quantiles using a Bayesian framework that incorporates a spatial dependence layer based on a Markov random field. A series of simulations tested the proposal, then applied to the sea surface temperature of the Mediterranean Sea, one of the first seas to be affected by the effects of climate change.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
An extended generalized Pareto regression model for count data
Authors:
Touqeer Ahmad,
Carlo Gaetan,
Philippe Naveau
Abstract:
The statistical modeling of discrete extremes has received less attention than their continuous counterparts in the Extreme Value Theory (EVT) literature. One approach to the transition from continuous to discrete extremes is the modeling of threshold exceedances of integer random variables by the discrete version of the generalized Pareto distribution. However, the optimal choice of thresholds de…
▽ More
The statistical modeling of discrete extremes has received less attention than their continuous counterparts in the Extreme Value Theory (EVT) literature. One approach to the transition from continuous to discrete extremes is the modeling of threshold exceedances of integer random variables by the discrete version of the generalized Pareto distribution. However, the optimal choice of thresholds defining exceedances remains a problematic issue. Moreover, in a regression framework, the treatment of the majority of non-extreme data below the selected threshold is either ignored or separated from the extremes. To tackle these issues, we expand on the concept of employing a smooth transition between the bulk and the upper tail of the distribution. In the case of zero inflation, we also develop models with an additional parameter. To incorporate possible predictors, we relate the parameters to additive smoothed predictors via an appropriate link, as in the generalized additive model (GAM) framework. A penalized maximum likelihood estimation procedure is implemented. We illustrate our modeling proposal with a real dataset of avalanche activity in the French Alps. With the advantage of bypassing the threshold selection step, our results indicate that the proposed models are more flexible and robust than competing models, such as the negative binomial distribution
△ Less
Submitted 17 June, 2024; v1 submitted 27 October, 2022;
originally announced October 2022.
-
Distributional regression models for Extended Generalized Pareto distributions
Authors:
Noémie Le Carrer,
Carlo Gaetan
Abstract:
The Extended Generalized Pareto Distribution (EGPD) (Naveau et al. 2016) is a family of distribution that has been introduced to model the full range of a positive random variable but with the lower and the upper tails distributed according to the peaks-over-threshold methodology. The aim of this article is to augment the scope of application of EGPD allowing the analyst to incorporate the effect…
▽ More
The Extended Generalized Pareto Distribution (EGPD) (Naveau et al. 2016) is a family of distribution that has been introduced to model the full range of a positive random variable but with the lower and the upper tails distributed according to the peaks-over-threshold methodology. The aim of this article is to augment the scope of application of EGPD allowing the analyst to incorporate the effect of covariates on the model. In particular we introduce a specification where the parameters of EGPD can be modeled as additive functions of the covariates, e.g. space or time. As a related product we provide an add-on code written in R that it is flexible enough to implement the EGPD in a generic way, allowing to introduce new parametric forms. We show the potential of our add-on on the modeling of hourly rainfalls over the North-West region of France and discuss modeling strategies.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Clustering of bivariate satellite time series: a quantile approach
Authors:
Victor Muthama Musau,
Carlo Gaetan,
Paolo Girardi
Abstract:
Clustering has received much attention in Statistics and Machine learning with the aim of developing statistical models and autonomous algorithms which are capable of acquiring information from raw data in order to perform exploratory analysis.Several techniques have been developed to cluster sampled univariate vectors only considering the average value over the whole period and as such they have…
▽ More
Clustering has received much attention in Statistics and Machine learning with the aim of developing statistical models and autonomous algorithms which are capable of acquiring information from raw data in order to perform exploratory analysis.Several techniques have been developed to cluster sampled univariate vectors only considering the average value over the whole period and as such they have not been able to explore fully the underlying distribution as well as other features of the data, especially in presence of structured time series. We propose a model-based clustering technique that is based on quantile regression permitting us to cluster bivariate time series at different quantile levels. We model the within cluster density using asymmetric Laplace distribution allowing us to take into account asymmetry in the distribution of the data. We evaluate the performance of the proposed technique through a simulation study. The method is then applied to cluster time series observed from Glob-colour satellite data related to trophic status indices with aim of evaluating their temporal dynamics in order to identify homogeneous areas, in terms of trophic status, in the Gulf of Gabes.
△ Less
Submitted 24 July, 2022;
originally announced July 2022.
-
A SEIR model with time-varying coefficients for analysing the SARS-CoV-2 epidemic
Authors:
P. Girardi,
C. Gaetan
Abstract:
In this study, we propose a time-dependent Susceptible-Exposed-Infected-Recovered (SEIR) model for the analysis of the SARS-CoV-2 epidemic outbreak in three different countries, the United States of America, Italy and Iceland using public data inherent the numbers of the epidemic wave. Since several types and grades of actions were adopted by the governments, including travel restrictions, social…
▽ More
In this study, we propose a time-dependent Susceptible-Exposed-Infected-Recovered (SEIR) model for the analysis of the SARS-CoV-2 epidemic outbreak in three different countries, the United States of America, Italy and Iceland using public data inherent the numbers of the epidemic wave. Since several types and grades of actions were adopted by the governments, including travel restrictions, social distancing, or limitation of movement, we want to investigate how these measures can affect the epidemic curve of the infectious population. The parameters of interest for the SEIR model were estimated employing a composite likelihood approach. Moreover, standard errors have been corrected for temporal dependence. The adoption of restrictive measures results in flatten epidemic curves, and the future evolution indicated a decrease in the number of cases.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Modeling and simulating depositional sequences using latent Gaussian random fields
Authors:
Denis Allard,
Paolo Fabbri,
Carlo Gaetan
Abstract:
Simulating a depositional (or stratigraphic) sequence conditionally on borehole data is a long-standing problem in hydrogeology and in petroleum geostatistics. This paper presents a new rule-based approach for simulating depositional sequences of surfaces conditionally on lithofacies thickness data. The thickness of each layer is modeled by a transformed latent Gaussian random field allowing for n…
▽ More
Simulating a depositional (or stratigraphic) sequence conditionally on borehole data is a long-standing problem in hydrogeology and in petroleum geostatistics. This paper presents a new rule-based approach for simulating depositional sequences of surfaces conditionally on lithofacies thickness data. The thickness of each layer is modeled by a transformed latent Gaussian random field allowing for null thickness thanks to a truncation process. Layers are sequentially stacked above each other following the regional stratigraphic sequence. By choosing adequately the variograms of these random fields, the simulated surfaces separating two layers can be continuous and smooth. Borehole information is often incomplete in the sense that it does not provide direct information as to the exact layer some observed thickness belongs to. The latent Gaussian model proposed in this paper offers a natural solution to this problem by means of a Bayesian setting with a Markov Chain Monte Carlo (MCMC) algorithm that can explore all possible configurations compatible with the data. The model and the associated MCMC algorithm are validated on synthetic data and then applied to a subsoil in the Venetian Plain with a moderately dense network of cored boreholes.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
On modelling positive continuous data with spatio-temporal dependence
Authors:
M. Bevilacqua,
C. Caamaño,
C. Gaetan
Abstract:
In this paper we concentrate on an alternative modeling strategy for positive data that exhibit spatial or spatio-temporal dependence. Specifically we propose to consider stochastic processes obtained trough a monotone transformation of scaled version of $χ^2$ random processes. The latter are well known in the specialized literature and originates by summing independent copies of a squared Gaussia…
▽ More
In this paper we concentrate on an alternative modeling strategy for positive data that exhibit spatial or spatio-temporal dependence. Specifically we propose to consider stochastic processes obtained trough a monotone transformation of scaled version of $χ^2$ random processes. The latter are well known in the specialized literature and originates by summing independent copies of a squared Gaussian process. However their use as stochastic models and related inference have not been much considered.
Motivated by a spatio-temporal analysis of wind speed data from a network of meteorological stations in the Netherlands, we exemplify our modeling strategy by means of a non-stationary process with Weibull marginal distributions. For the proposed Weibull process we study the second-order and geometrical properties and we provide analytic expressions for the bivariate distribution. Since the likelihood is intractable, even for relatively small data-set, we suggest to adopt the pairwise likelihood as a tool for the inference. Moreover we tackle the prediction problem and we propose a linear prediction. The effectiveness of our modeling strategy is illustrated through the analysis of the aforementioned Netherland wind speed data that we supplement with a simulation study.
△ Less
Submitted 7 April, 2020; v1 submitted 11 August, 2018;
originally announced August 2018.
-
Hierarchical space-time modeling of exceedances with an application to rainfall data
Authors:
Jean-Noel Bacro,
Carlo Gaetan,
Thomas Opitz,
Gwladys Toulemonde
Abstract:
The statistical modeling of space-time extremes in environmental applications is key to understanding complex dependence structures in original event data and to generating realistic scenarios for impact models. In this context of high-dimensional data, we propose a novel hierarchical model for high threshold exceedances defined over continuous space and time by embedding a space-time Gamma proces…
▽ More
The statistical modeling of space-time extremes in environmental applications is key to understanding complex dependence structures in original event data and to generating realistic scenarios for impact models. In this context of high-dimensional data, we propose a novel hierarchical model for high threshold exceedances defined over continuous space and time by embedding a space-time Gamma process convolution for the rate of an exponential variable, leading to asymptotic independence in space and time. Its physically motivated anisotropic dependence structure is based on geometric objects moving through space-time according to a velocity vector. We demonstrate that inference based on weighted pairwise likelihood is fast and accurate. The usefulness of our model is illustrated by an application to hourly precipitation data from a study region in Southern France, where it clearly improves on an alternative censored Gaussian space-time random field model. While classical limit models based on threshold-stability fail to appropriately capture relatively fast joint tail decay rates between asymptotic dependence and classical independence, strong empirical evidence from our application and other recent case studies motivates the use of more realistic asymptotic independence models such as ours.
△ Less
Submitted 14 May, 2019; v1 submitted 8 August, 2017;
originally announced August 2017.
-
Comparing composite likelihood methods based on pairs for spatial Gaussian random fieldsM
Authors:
Moreno Bevilacqua,
Carlo Gaetan
Abstract:
In the last years there has been a growing interest in proposing methods for estimating covariance functions for geostatistical data. Among these, maximum likelihood estimators have nice features when we deal with a Gaussian model. However maximum likelihood becomes impractical when the number of observations is very large. In this work we review some solutions and we contrast them in terms of los…
▽ More
In the last years there has been a growing interest in proposing methods for estimating covariance functions for geostatistical data. Among these, maximum likelihood estimators have nice features when we deal with a Gaussian model. However maximum likelihood becomes impractical when the number of observations is very large. In this work we review some solutions and we contrast them in terms of loss of statistical efficiency and computational burden. Specifically we focus on three types of weighted composite likelihood functions based on pairs and we compare them with the method of covariance tapering. Asymptotics properties of the three estimation methods are derived. We illustrate the effectiveness of the methods through theoretical examples, simulation experiments and by analysing a data set on yearly total precipitation anomalies at weather stations in the United States.
△ Less
Submitted 23 May, 2013;
originally announced May 2013.
-
Estimation of spatial max-stable models using threshold exceedances
Authors:
Jean-Noel Bacro,
Carlo Gaetan
Abstract:
Parametric inference for spatial max-stable processes is difficult since the related likelihoods are unavailable. A composite likelihood approach based on the bivariate distribution of block maxima has been recently proposed in the literature. However modeling block maxima is a wasteful approach provided that other information is available. Moreover an approach based on block, typically annual, ma…
▽ More
Parametric inference for spatial max-stable processes is difficult since the related likelihoods are unavailable. A composite likelihood approach based on the bivariate distribution of block maxima has been recently proposed in the literature. However modeling block maxima is a wasteful approach provided that other information is available. Moreover an approach based on block, typically annual, maxima is unable to take into account the fact that maxima occur or not simultaneously. If time series of, say, daily data are available, then estimation procedures based on exceedances of a high threshold could mitigate such problems. In this paper we focus on two approaches for composing likelihoods based on pairs of exceedances. The first one comes from the tail approximation for bivariate distribution proposed by Ledford and Tawn (1996) when both pairs of observations exceed the fixed threshold. The second one uses the bivariate extension (Rootzen and Tajvidi, 2006) of the generalized Pareto distribution which allows to model exceedances when at least one of the components is over the threshold. The two approaches are compared through a simulation study according to different degrees of spatial dependency. Results show that both the strength of the spatial dependencies and the threshold choice play a fundamental role in determining which is the best estimating procedure.
△ Less
Submitted 5 May, 2012;
originally announced May 2012.