Search | arXiv e-print repository

Adaptive Bayesian Very Short-Term Wind Power Forecasting Based on the Generalised Logit Transformation

Authors: Tao Shen, Jethro Browell, Daniela Castro-Camilo

Abstract: Wind power plays an increasingly significant role in achieving the 2050 Net Zero Strategy. Despite its rapid growth, its inherent variability presents challenges in forecasting. Accurately forecasting wind power generation is one key demand for the stable and controllable integration of renewable energy into existing grid operations. This paper proposes an adaptive method for very short-term forec… ▽ More Wind power plays an increasingly significant role in achieving the 2050 Net Zero Strategy. Despite its rapid growth, its inherent variability presents challenges in forecasting. Accurately forecasting wind power generation is one key demand for the stable and controllable integration of renewable energy into existing grid operations. This paper proposes an adaptive method for very short-term forecasting that combines the generalised logit transformation with a Bayesian approach. The generalised logit transformation processes double-bounded wind power data to an unbounded domain, facilitating the application of Bayesian methods. A novel adaptive mechanism for updating the transformation shape parameter is introduced to leverage Bayesian updates by recovering a small sample of representative data. Four adaptive forecasting methods are investigated, evaluating their advantages and limitations through an extensive case study of over 100 wind farms ranging four years in the UK. The methods are evaluated using the Continuous Ranked Probability Score and we propose the use of functional reliability diagrams to assess calibration. Results indicate that the proposed Bayesian method with adaptive shape parameter updating outperforms benchmarks, yielding consistent improvements in CRPS and forecast reliability. The method effectively addresses uncertainty, ensuring robust and accurate probabilistic forecasting which is essential for grid integration and decision-making. △ Less

Submitted 8 May, 2025; originally announced May 2025.

Comments: 31 pages, 10 figures and tables. Submitted to International Journal of Forecasting

arXiv:2504.20268 [pdf, other]

Spatio-temporal data fusion of censored threshold exceedances

Authors: M. Daniela Cuba, Craig Wilkie, Marian Scott, Daniela Castro-Camilo

Abstract: Data fusion models are widely used in air quality monitoring to integrate in situ and remote-sensing data, offering spatially complete and temporally detailed estimates. However, traditional Gaussian-based models often underestimate extreme pollution values, leading to biased risk assessments. To address this, we present a Bayesian hierarchical data fusion framework rooted in extreme value theory,… ▽ More Data fusion models are widely used in air quality monitoring to integrate in situ and remote-sensing data, offering spatially complete and temporally detailed estimates. However, traditional Gaussian-based models often underestimate extreme pollution values, leading to biased risk assessments. To address this, we present a Bayesian hierarchical data fusion framework rooted in extreme value theory, using the Dirac-delta generalised Pareto distribution to jointly account for threshold and non-threshold exceedances while preserving the temporal structure of extreme events. Our model is used to describe and predict censored threshold exceedances of PM2.5 pollution in the Greater London region by using remote sensing observations from the EAC4 dataset, a reanalysis product from the Copernicus Atmospheric Monitoring Service (CAMS), and in situ observation stations from the automatic urban and rural network (AURN) ran by the UK government. Some of our approach's key innovations include combining data with varying spatio-temporal resolutions and fully accounting for parameter uncertainties. Results show that our model outperforms Gaussian-based alternatives and standalone remote-sensing data in predicting threshold exceedances at the majority of observation sites and can even result in improved spatial patterns of PM2.5 pollution than those discernible from the remote-sensing data. Moreover, our approach captures greater variability and spatial patterns, such as higher PM2.5 concentrations near coastal areas, which are not evident in remote-sensing data alone. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2503.11822 [pdf, other]

GPDFlow: Generative Multivariate Threshold Exceedance Modeling via Normalizing Flows

Authors: Chenglei Hu, Daniela Castro-Camilo

Abstract: The multivariate generalized Pareto distribution (mGPD) is a common method for modeling extreme threshold exceedance probabilities in environmental and financial risk management. Despite its broad applicability, mGPD faces challenges due to the infinite possible parametrizations of its dependence function, with only a few parametric models available in practice. To address this limitation, we intr… ▽ More The multivariate generalized Pareto distribution (mGPD) is a common method for modeling extreme threshold exceedance probabilities in environmental and financial risk management. Despite its broad applicability, mGPD faces challenges due to the infinite possible parametrizations of its dependence function, with only a few parametric models available in practice. To address this limitation, we introduce GPDFlow, an innovative mGPD model that leverages normalizing flows to flexibly represent the dependence structure. Unlike traditional parametric mGPD approaches, GPDFlow does not impose explicit parametric assumptions on dependence, resulting in greater flexibility and enhanced performance. Additionally, GPDFlow allows direct inference of marginal parameters, providing insights into marginal tail behavior. We derive tail dependence coefficients for GPDFlow, including a bivariate formulation, a $d$-dimensional extension, and an alternative measure for partial exceedance dependence. A general relationship between the bivariate tail dependence coefficient and the generative samples from normalizing flows is discussed. Through simulations and a practical application analyzing the risk among five major US banks, we demonstrate that GPDFlow significantly improves modeling accuracy and flexibility compared to traditional parametric methods. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 31 pages, 7 figures

arXiv:2402.14624 [pdf, other]

A bivariate spatial extreme mixture model for unreplicated heavy metal soil contamination

Authors: M. Daniela Cuba, Marian Scott, Benjamin P. Marchant, Daniela Castro-Camilo

Abstract: Geostatistical models for multivariate applications such as heavy metal soil contamination work under Gaussian assumptions and may result in underestimated extreme values and misleading risk assessments (Marchant et al, 2011). A more suitable framework to analyse extreme values is extreme value theory (EVT). However, EVT relies on replications in time, which are generally not available in geochemi… ▽ More Geostatistical models for multivariate applications such as heavy metal soil contamination work under Gaussian assumptions and may result in underestimated extreme values and misleading risk assessments (Marchant et al, 2011). A more suitable framework to analyse extreme values is extreme value theory (EVT). However, EVT relies on replications in time, which are generally not available in geochemical datasets. Therefore, using EVT to map soil contamination requires adaptation to be used in the usual single-replicate data framework of soil surveys. We propose a bivariate spatial extreme mixture model to model the body and tail of contaminant pairs, where the tails are described using a stationary generalised Pareto distribution. We demonstrate the performance of our model using a simulation study and through modelling bivariate soil contamination in the Glasgow conurbation. Model results are given as maps of predicted marginal concentrations and probabilities of joint exceedance of soil guideline values. Marginal concentration maps show areas of elevated lead levels along the Clyde River and elevated levels of chromium around the south and southeast villages such as East Kilbride and Wishaw. The joint probability maps show higher probabilities of joint exceedance to the south and southeast of the city centre, following known legacy contamination regions in the Clyde River basin. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.15703 [pdf, other]

A Bayesian multivariate extreme value mixture model

Authors: Chenglei Hu, Ben Swallow, Daniela Castro-Camilo

Abstract: Impact assessment of natural hazards requires the consideration of both extreme and non-extreme events. Extensive research has been conducted on the joint modeling of bulk and tail in univariate settings; however, the corresponding body of research in the context of multivariate analysis is comparatively scant. This study extends the univariate joint modeling of bulk and tail to the multivariate f… ▽ More Impact assessment of natural hazards requires the consideration of both extreme and non-extreme events. Extensive research has been conducted on the joint modeling of bulk and tail in univariate settings; however, the corresponding body of research in the context of multivariate analysis is comparatively scant. This study extends the univariate joint modeling of bulk and tail to the multivariate framework. Specifically, it pertains to cases where multivariate observations exceed a high threshold in at least one component. We propose a multivariate mixture model that assumes a parametric model to capture the bulk of the distribution, which is in the max-domain of attraction (MDA) of a multivariate extreme value distribution (mGEVD). The tail is described by the multivariate generalized Pareto distribution, which is asymptotically justified to model multivariate threshold exceedances. We show that if all components exceed the threshold, our mixture model is in the MDA of an mGEVD. Bayesian inference based on multivariate random-walk Metropolis-Hastings and the automated factor slice sampler allows us to incorporate uncertainty from the threshold selection easily. Due to computational limitations, simulations and data applications are provided for dimension $d=2$, but a discussion is provided with views toward scalability based on pairwise likelihood. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 34 pages, 7 figures

arXiv:2106.13110 [pdf, other]

Practical strategies for GEV-based regression models for extremes

Authors: Daniela Castro-Camilo, Raphaël Huser, Håvard Rue

Abstract: The generalised extreme value (GEV) distribution is a three parameter family that describes the asymptotic behaviour of properly renormalised maxima of a sequence of independent and identically distributed random variables. If the shape parameter $ξ$ is zero, the GEV distribution has unbounded support, whereas if $ξ$ is positive, the limiting distribution is heavy-tailed with infinite upper endpoi… ▽ More The generalised extreme value (GEV) distribution is a three parameter family that describes the asymptotic behaviour of properly renormalised maxima of a sequence of independent and identically distributed random variables. If the shape parameter $ξ$ is zero, the GEV distribution has unbounded support, whereas if $ξ$ is positive, the limiting distribution is heavy-tailed with infinite upper endpoint but finite lower endpoint. In practical applications, we assume that the GEV family is a reasonable approximation for the distribution of maxima over blocks, and we fit it accordingly. This implies that GEV properties, such as finite lower endpoint in the case $ξ>0$, are inherited by the finite-sample maxima, which might not have bounded support. This is particularly problematic when predicting extreme observations based on multiple and interacting covariates. To tackle this usually overlooked issue, we propose a blended GEV distribution, which smoothly combines the left tail of a Gumbel distribution (GEV with $ξ=0$) with the right tail of a Fréchet distribution (GEV with $ξ>0$) and, therefore, has unbounded support. Using a Bayesian framework, we reparametrise the GEV distribution to offer a more natural interpretation of the (possibly covariate-dependent) model parameters. Independent priors over the new location and spread parameters induce a joint prior distribution for the original location and scale parameters. We introduce the concept of property-preserving penalised complexity (P$^3$C) priors and apply it to the shape parameter to preserve first and second moments. We illustrate our methods with an application to NO$_2$ pollution levels in California, which reveals the robustness of the bGEV distribution, as well as the suitability of the new parametrisation and the P$^3$C prior framework. △ Less

Submitted 7 May, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: 19 pages, 3 figures

arXiv:2105.09062 [pdf, other]

doi 10.1007/s13253-022-00500-7

Modelling sub-daily precipitation extremes with the blended generalised extreme value distribution

Authors: Silius M. Vandeskog, Sara Martino, Daniela Castro-Camilo, Håvard Rue

Abstract: A new method is proposed for modelling the yearly maxima of sub-daily precipitation, with the aim of producing spatial maps of return level estimates. Yearly precipitation maxima are modelled using a Bayesian hierarchical model with a latent Gaussian field, with the blended generalised extreme value (bGEV) distribution used as a substitute for the more standard generalised extreme value (GEV) dist… ▽ More A new method is proposed for modelling the yearly maxima of sub-daily precipitation, with the aim of producing spatial maps of return level estimates. Yearly precipitation maxima are modelled using a Bayesian hierarchical model with a latent Gaussian field, with the blended generalised extreme value (bGEV) distribution used as a substitute for the more standard generalised extreme value (GEV) distribution. Inference is made less wasteful with a novel two-step procedure that performs separate modelling of the scale parameter of the bGEV distribution using peaks over threshold data. Fast inference is performed using integrated nested Laplace approximations (INLA) together with the stochastic partial differential equation (SPDE) approach, both implemented in R-INLA. Heuristics for improving the numerical stability of R-INLA with the GEV and bGEV distributions are also presented. The model is fitted to yearly maxima of sub-daily precipitation from the south of Norway, and is able to quickly produce high-resolution return level maps with uncertainty. The proposed two-step procedure provides an improved model fit over standard inference techniques when modelling the yearly maxima of sub-daily precipitation with the bGEV distribution. △ Less

Submitted 21 May, 2022; v1 submitted 19 May, 2021; originally announced May 2021.

arXiv:2004.00386 [pdf, other]

Bayesian space-time gap filling for inference on extreme hot-spots: an application to Red Sea surface temperatures

Authors: Daniela Castro-Camilo, Linda Mhalla, Thomas Opitz

Abstract: We develop a method for probabilistic prediction of extreme value hot-spots in a spatio-temporal framework, tailored to big datasets containing important gaps. In this setting, direct calculation of summaries from data, such as the minimum over a space-time domain, is not possible. To obtain predictive distributions for such cluster summaries, we propose a two-step approach. We first model margina… ▽ More We develop a method for probabilistic prediction of extreme value hot-spots in a spatio-temporal framework, tailored to big datasets containing important gaps. In this setting, direct calculation of summaries from data, such as the minimum over a space-time domain, is not possible. To obtain predictive distributions for such cluster summaries, we propose a two-step approach. We first model marginal distributions with a focus on accurate modeling of the right tail and then, after transforming the data to a standard Gaussian scale, we estimate a Gaussian space-time dependence model defined locally in the time domain for the space-time subregions where we want to predict. In the first step, we detrend the mean and standard deviation of the data and fit a spatially resolved generalized Pareto distribution to apply a correction of the upper tail. To ensure spatial smoothness of the estimated trends, we either pool data using nearest-neighbor techniques, or apply generalized additive regression modeling. To cope with high space-time resolution of data, the local Gaussian models use a Markov representation of the Matérn correlation function based on the stochastic partial differential equations (SPDE) approach. In the second step, they are fitted in a Bayesian framework through the integrated nested Laplace approximation implemented in R-INLA. Finally, posterior samples are generated to provide statistical inferences through Monte-Carlo estimation. Motivated by the 2019 Extreme Value Analysis data challenge, we illustrate our approach to predict the distribution of local space-time minima in anomalies of Red Sea surface temperatures, using a gridded dataset (11315 days, 16703 pixels) with artificially generated gaps. In particular, we show the improved performance of our two-step approach over a purely Gaussian model without tail transformations. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:1810.04099 [pdf, other]

A spliced Gamma-Generalized Pareto model for short-term extreme wind speed probabilistic forecasting

Authors: Daniela Castro-Camilo, Raphaël Huser, Håvard Rue

Abstract: Renewable sources of energy such as wind power have become a sustainable alternative to fossil fuel-based energy. However, the uncertainty and fluctuation of the wind speed derived from its intermittent nature bring a great threat to the wind power production stability, and to the wind turbines themselves. Lately, much work has been done on developing models to forecast average wind speed values,… ▽ More Renewable sources of energy such as wind power have become a sustainable alternative to fossil fuel-based energy. However, the uncertainty and fluctuation of the wind speed derived from its intermittent nature bring a great threat to the wind power production stability, and to the wind turbines themselves. Lately, much work has been done on developing models to forecast average wind speed values, yet surprisingly little has focused on proposing models to accurately forecast extreme wind speeds, which can damage the turbines. In this work, we develop a flexible spliced Gamma-Generalized Pareto model to forecast extreme and non-extreme wind speeds simultaneously. Our model belongs to the class of latent Gaussian models, for which inference is conveniently performed based on the integrated nested Laplace approximation method. Considering a flexible additive regression structure, we propose two models for the latent linear predictor to capture the spatio-temporal dynamics of wind speeds. Our models are fast to fit and can describe both the bulk and the tail of the wind speed distribution while producing short-term extreme and non-extreme wind speed probabilistic forecasts. △ Less

Submitted 29 June, 2019; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: 25 pages

arXiv:1710.00875 [pdf, other]

Local likelihood estimation of complex tail dependence structures, applied to U.S. precipitation extremes

Authors: Daniela Castro-Camilo, Raphaël Huser

Abstract: To disentangle the complex non-stationary dependence structure of precipitation extremes over the entire contiguous U.S., we propose a flexible local approach based on factor copula models. Our sub-asymptotic spatial modeling framework yields non-trivial tail dependence structures, with a weakening dependence strength as events become more extreme, a feature commonly observed with precipitation da… ▽ More To disentangle the complex non-stationary dependence structure of precipitation extremes over the entire contiguous U.S., we propose a flexible local approach based on factor copula models. Our sub-asymptotic spatial modeling framework yields non-trivial tail dependence structures, with a weakening dependence strength as events become more extreme, a feature commonly observed with precipitation data but not accounted for in classical asymptotic extreme-value models. To estimate the local extremal behavior, we fit the proposed model in small regional neighborhoods to high threshold exceedances, under the assumption of local stationarity, which allows us to gain in flexibility. Adopting a local censored likelihood approach, inference is made on a fine spatial grid, and local estimation is performed by taking advantage of distributed computing resources and the embarrassingly parallel nature of this estimation procedure. The local model is efficiently fitted at all grid points, and uncertainty is measured using a block bootstrap procedure. An extensive simulation study shows that our approach can adequately capture complex, non-stationary dependencies, while our study of U.S. winter precipitation data reveals interesting differences in local tail structures over space, which has important implications on regional risk assessment of extreme precipitation events. △ Less

Submitted 25 March, 2019; v1 submitted 2 October, 2017; originally announced October 2017.

Showing 1–10 of 10 results for author: Castro-Camilo, D