-
Semiparametric Estimation of the Shape of the Limiting Bivariate Point Cloud
Authors:
Reetam Majumder,
Benjamin A. Shaby,
Brian J. Reich,
Daniel Cooley
Abstract:
We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using Bezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We…
▽ More
We propose a model to flexibly estimate joint tail properties by exploiting the convergence of an appropriately scaled point cloud onto a compact limit set. Characteristics of the shape of the limit set correspond to key tail dependence properties. We directly model the shape of the limit set using Bezier splines, which allow flexible and parsimonious specification of shapes in two dimensions. We fit the Bezier splines to data in pseudo-polar coordinates using Markov chain Monte Carlo sampling, utilizing a limiting approximation to the conditional likelihood of the radii given angles. We propose a novel prior on the shape of the limit set via constraints on the parameters of the Bezier splines. A direct advantage of our Bayesian approach is that the support of this prior guarantees that each posterior sample is a valid limit set boundary, allowing direct posterior analysis of any quantity derived from the shape of the curve. Furthermore, we obtain interpretable inference on the asymptotic dependence class by using mixture priors with point masses on the corner of the unit box. Finally, we apply our model to bivariate datasets of extremes of variables related to fire risk and air pollution.
△ Less
Submitted 3 April, 2025; v1 submitted 22 June, 2023;
originally announced June 2023.
-
Partial Tail Correlation for Extremes
Authors:
Jeongjin Lee,
Daniel Cooley
Abstract:
In order to understand structural relationships among sets of variables at extreme levels, we develop an extremes analogue to partial correlation. We begin by developing an inner product space constructed from transformed-linear combinations of independent regularly varying random variables. We define partial tail correlation via the projection theorem for the inner product space. We show that the…
▽ More
In order to understand structural relationships among sets of variables at extreme levels, we develop an extremes analogue to partial correlation. We begin by developing an inner product space constructed from transformed-linear combinations of independent regularly varying random variables. We define partial tail correlation via the projection theorem for the inner product space. We show that the partial tail correlation can be understood as the inner product of the prediction errors from transformed-linear prediction. We connect partial tail correlation to the inverse of the inner product matrix and show that a zero in this inverse implies a partial tail correlation of zero. We then show that under a modeling assumption that the random variables belong to a sensible subset of the inner product space, the matrix of inner products corresponds to the previously-studied tail pairwise dependence matrix. We develop a hypothesis test for partial tail correlation of zero. We demonstrate the performance in two applications: high nitrogen dioxide levels in Washington DC and extreme river discharges in the upper Danube basin.
△ Less
Submitted 16 June, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Transformed-linear prediction for extremes
Authors:
Jeongjin Lee,
Daniel Cooley
Abstract:
We consider the problem of performing prediction when observed values are at their highest levels. We construct an inner product space of nonnegative random variables from transformed-linear combinations of independent regularly varying random variables. The matrix of inner products corresponds to the tail pairwise dependence matrix, which summarizes tail dependence. The projection theorem yields…
▽ More
We consider the problem of performing prediction when observed values are at their highest levels. We construct an inner product space of nonnegative random variables from transformed-linear combinations of independent regularly varying random variables. The matrix of inner products corresponds to the tail pairwise dependence matrix, which summarizes tail dependence. The projection theorem yields the optimal transformed-linear predictor, which has the same form as the best linear unbiased predictor in non-extreme prediction. We also construct prediction intervals based on the geometry of regular variation. We show that these intervals have good coverage in a simulation study as well as in two applications; prediction of high pollution levels, and prediction of large financial losses.
△ Less
Submitted 14 March, 2023; v1 submitted 5 November, 2021;
originally announced November 2021.
-
Simulating flood event sets using extremal principal components
Authors:
Christian Rohrbeck,
Daniel Cooley
Abstract:
Hazard event sets, a collection of synthetic extreme events over a given period, are important for catastrophe modelling. This paper addresses the issue of generating event sets of extreme river flow for northern England and southern Scotland, a region which has been particularly affected by severe flooding over the past 20 years. We start by analysing historical extreme river flow across 45 gauge…
▽ More
Hazard event sets, a collection of synthetic extreme events over a given period, are important for catastrophe modelling. This paper addresses the issue of generating event sets of extreme river flow for northern England and southern Scotland, a region which has been particularly affected by severe flooding over the past 20 years. We start by analysing historical extreme river flow across 45 gauges, located within the study region, using methods from extreme value analysis, including the concept of extremal principal components. Our analysis reveals interesting connections between the extremal dependence structure and the region's topography/climate. We then introduce a framework which is based on modelling the distribution of the extremal principal components in order to generate synthetic events of extreme river flow. The generative framework is dimension-reducing in that it distinctly handles the principal components based on their contribution to describing the nature of extreme river flow across the study region. We also detail a data-driven approach to select the optimal dimension. Synthetic flood events are subsequently generated efficiently by sampling from the fitted distribution. Our approach for generating hazard event sets can be easily implemented by practitioners and our results indicate good agreement between the observed and simulated extreme river flow dynamics. For the considered application, we also find that our approach outperforms existing statistical approaches for generating hazard event sets.
△ Less
Submitted 16 March, 2022; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Transformed-Linear Models for Time Series Extremes
Authors:
Nehali Mhatre,
Daniel Cooley
Abstract:
In order to capture the dependence in the upper tail of a time series, we develop non-negative regularly-varying time series models that are constructed similarly to classical non-extreme ARMA models. Rather than fully characterizing tail dependence of the time series, we define the concept of weak tail stationarity which allows us to describe a regularly-varying time series through the tail pairw…
▽ More
In order to capture the dependence in the upper tail of a time series, we develop non-negative regularly-varying time series models that are constructed similarly to classical non-extreme ARMA models. Rather than fully characterizing tail dependence of the time series, we define the concept of weak tail stationarity which allows us to describe a regularly-varying time series through the tail pairwise dependence function (TPDF) which is a measure of pairwise extremal dependencies. We state consistency requirements among the finite-dimensional collections of the elements of a regularly-varying time series and show that the TPDF's value does not depend on the dimension being considered. So that our models take nonnegative values, we use transformed-linear operations. We show existence and stationarity of these models, and develop their properties such as the model TPDF's. Additionally, we show the class of transformed-linear MA($\infty$) models forms an inner product space. Motivated by investigating conditions conducive to the spread of wildfires, we fit models to hourly windspeed data and find that the fitted transformed-linear models produce better estimates of upper tail quantities than traditional ARMA models or than classical linear regularly varying models.
△ Less
Submitted 25 October, 2021; v1 submitted 11 December, 2020;
originally announced December 2020.
-
New Exploratory Tools for Extremal Dependence: Chi Networks and Annual Extremal Networks
Authors:
Whitney K. Huang,
Daniel S. Cooley,
Imme Ebert-Uphoff,
Chen Chen,
Snigdhansu Chatterjee
Abstract:
Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the $χ$ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A $χ$ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, $\hat χ$, exceeds a prescri…
▽ More
Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work we propose the $χ$ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A $χ$ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, $\hat χ$, exceeds a prescribed threshold. We develop an initial $χ$ network estimator and we use a spatial block bootstrap to assess both the bias and variance of our estimator. We then develop a method to correct the bias of the initial estimator by incorporating the spatial structure in $χ$. In addition to the $χ$ network, which assesses spatial extremal dependence over an extended period of time, we further introduce an annual extremal network to explore the year-to-year temporal variation of extremal connections. We illustrate the $χ$ and the annual extremal networks by analyzing the hurricane season maximum precipitation at the US Gulf Coast and surrounding area. Analysis suggests there exists long distance extremal dependence for precipitation extremes in the study region and the strength of the extremal dependence may depend on some regional scale meteorological conditions, for example, sea surface temperature.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
Improved return level estimation via a weighted likelihood, latent spatial extremes model
Authors:
Joshua Hewitt,
Miranda J. Fix,
Jennifer A. Hoeting,
Daniel S. Cooley
Abstract:
Uncertainty in return level estimates for rare events, like the intensity of large rainfall events, makes it difficult to develop strategies to mitigate related hazards, like flooding. Latent spatial extremes models reduce uncertainty by exploiting spatial dependence in statistical characteristics of extreme events to borrow strength across locations. However, these estimates can have poor propert…
▽ More
Uncertainty in return level estimates for rare events, like the intensity of large rainfall events, makes it difficult to develop strategies to mitigate related hazards, like flooding. Latent spatial extremes models reduce uncertainty by exploiting spatial dependence in statistical characteristics of extreme events to borrow strength across locations. However, these estimates can have poor properties due to model misspecification: many latent spatial extremes models do not account for extremal dependence, which is spatial dependence in the extreme events themselves. We improve estimates from latent spatial extremes models that make conditional independence assumptions by proposing a weighted likelihood that uses the extremal coefficient to incorporate information about extremal dependence during estimation. This approach differs from, and is simpler than, directly modeling the spatial extremal dependence; for example, by fitting a max-stable process, which is challenging to fit to real, large datasets. We adopt a hierarchical Bayesian framework for inference, use simulation to show the weighted model provides improved estimates of high quantiles, and apply our model to improve return level estimates for Colorado rainfall events with 1% annual exceedance probability.
△ Less
Submitted 21 December, 2018; v1 submitted 16 October, 2018;
originally announced October 2018.
-
A Nonparametric Method for Producing Isolines of Bivariate Exceedance Probabilities
Authors:
Daniel Cooley,
Emeric Thibaud,
Federico Castillo,
Michael F. Wehner
Abstract:
We present a method for drawing isolines indicating regions of equal joint exceedance probability for bivariate data. The method relies on bivariate regular variation, a dependence framework widely used for extremes. This framework enables drawing isolines corresponding to very low exceedance probabilities and these lines may lie beyond the range of the data. The method we utilize for characterizi…
▽ More
We present a method for drawing isolines indicating regions of equal joint exceedance probability for bivariate data. The method relies on bivariate regular variation, a dependence framework widely used for extremes. This framework enables drawing isolines corresponding to very low exceedance probabilities and these lines may lie beyond the range of the data. The method we utilize for characterizing dependence in the tail is largely nonparametric. Furthermore, we extend this method to the case of asymptotic independence and propose a procedure which smooths the transition from asymptotic independence in the interior to the first-order behavior on the axes. We propose a diagnostic plot for assessing isoline estimate and choice of smoothing, and a bootstrap procedure to visually assess uncertainty.
△ Less
Submitted 14 October, 2017;
originally announced October 2017.
-
Decompositions of Dependence for High-Dimensional Extremes
Authors:
Daniel Cooley,
Emeric Thibaud
Abstract:
Employing the framework of regular variation, we propose two decompositions which help to summarize and describel high-dimensional tail dependence. Via transformation, we define a vector space on the positive orthant, yielding the notion of basis. With a suitably-chosen transformation, we show that transformed-linear operations applied to regularly varying random vectors preserve regular variation…
▽ More
Employing the framework of regular variation, we propose two decompositions which help to summarize and describel high-dimensional tail dependence. Via transformation, we define a vector space on the positive orthant, yielding the notion of basis. With a suitably-chosen transformation, we show that transformed-linear operations applied to regularly varying random vectors preserve regular variation. Rather than model regular-variation's angular measure, we summarize tail dependence via a matrix of pairwise tail dependence metrics. This matrix is positive semidefinite, and eigendecomposition allows one to interpret tail dependence via the resulting eigenbasis. Additionally this matrix is completely positive, and a resulting decomposition allows one to easily construct regularly varying random vectors which share the same pairwise tail dependencies. We illustrate our methods with Swiss rainfall data and financial return data.
△ Less
Submitted 25 April, 2018; v1 submitted 20 December, 2016;
originally announced December 2016.
-
Bayesian inference for the Brown-Resnick process, with an application to extreme low temperatures
Authors:
Emeric Thibaud,
Juha Aalto,
Daniel S. Cooley,
Anthony C. Davison,
Juha Heikkinen
Abstract:
The Brown-Resnick max-stable process has proven to be well-suited for modeling extremes of complex environmental processes, but in many applications its likelihood function is intractable and inference must be based on a composite likelihood, thereby preventing the use of classical Bayesian techniques. In this paper we exploit a case in which the full likelihood of a Brown-Resnick process can be c…
▽ More
The Brown-Resnick max-stable process has proven to be well-suited for modeling extremes of complex environmental processes, but in many applications its likelihood function is intractable and inference must be based on a composite likelihood, thereby preventing the use of classical Bayesian techniques. In this paper we exploit a case in which the full likelihood of a Brown-Resnick process can be calculated, using componentwise maxima and their partitions in terms of individual events, and we propose two new approaches to inference. The first estimates the partitions using declustering, while the second uses random partitions in a Markov chain Monte Carlo algorithm. We use these approaches to construct a Bayesian hierarchical model for extreme low temperatures in northern Fennoscandia.
△ Less
Submitted 17 October, 2016; v1 submitted 25 June, 2015;
originally announced June 2015.
-
Data Mining to Investigate the Meteorological Drivers for Extreme Ground Level Ozone Events
Authors:
Brook T. Russell,
Daniel Cooley,
William C. Porter,
Brian J. Reich,
Colette L. Heald
Abstract:
This project aims to explore which combinations of meteorological conditions are associated with extreme ground level ozone conditions. Our approach focuses only on the tail by optimizing the tail dependence between the ozone response and functions of meteorological covariates. Since there is a long list of possible meteorological covariates, the space of possible models cannot be explored complet…
▽ More
This project aims to explore which combinations of meteorological conditions are associated with extreme ground level ozone conditions. Our approach focuses only on the tail by optimizing the tail dependence between the ozone response and functions of meteorological covariates. Since there is a long list of possible meteorological covariates, the space of possible models cannot be explored completely. Consequently, we perform data mining within the model selection context, employing an automated model search procedure. Our study is unique among extremes applications as optimizing tail dependence has not previously been attempted, and it presents new challenges, such as requiring a smooth threshold. We present a simulation study which shows that the method can detect complicated conditions leading to extreme responses and resists overfitting. We apply the method to ozone data for Atlanta and Charlotte and find similar meteorological drivers for these two Southeastern US cities. We identify several covariates which help to differentiate the meteorological conditions which lead to extreme ozone levels from those which lead to merely high levels.
△ Less
Submitted 11 March, 2016; v1 submitted 30 April, 2015;
originally announced April 2015.
-
A Markov-switching model for heat waves
Authors:
Benjamin A. Shaby,
Brian J. Reich,
Daniel Cooley,
Cari G. Kaufman
Abstract:
Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave-a persistent event in the tail of the temperature distribution-to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal depen…
▽ More
Heat waves merit careful study because they inflict severe economic and societal damage. We use an intuitive, informal working definition of a heat wave-a persistent event in the tail of the temperature distribution-to motivate an interpretable latent state extreme value model. A latent variable with dependence in time indicates membership in the heat wave state. The strength of the temporal dependence of the latent variable controls the frequency and persistence of heat waves. Within each heat wave, temperatures are modeled using extreme value distributions, with extremal dependence across time accomplished through an extreme value Markov model. One important virtue of interpretability is that model parameters directly translate into quantities of interest for risk management, so that questions like whether heat waves are becoming longer, more severe or more frequent are easily answered by querying an appropriate fitted model. We demonstrate the latent state model on two recent, calamitous, examples: the European heat wave of 2003 and the Russian heat wave of 2010.
△ Less
Submitted 23 June, 2016; v1 submitted 15 May, 2014;
originally announced May 2014.
-
Extreme value analysis for evaluating ozone control strategies
Authors:
Brian Reich,
Daniel Cooley,
Kristen Foley,
Sergey Napelenok,
Benjamin Shaby
Abstract:
Tropospheric ozone is one of six criteria pollutants regulated by the US EPA, and has been linked to respiratory and cardiovascular endpoints and adverse effects on vegetation and ecosystems. Regional photochemical models have been developed to study the impacts of emission reductions on ozone levels. The standard approach is to run the deterministic model under new emission levels and attribute t…
▽ More
Tropospheric ozone is one of six criteria pollutants regulated by the US EPA, and has been linked to respiratory and cardiovascular endpoints and adverse effects on vegetation and ecosystems. Regional photochemical models have been developed to study the impacts of emission reductions on ozone levels. The standard approach is to run the deterministic model under new emission levels and attribute the change in ozone concentration to the emission control strategy. However, running the deterministic model requires substantial computing time, and this approach does not provide a measure of uncertainty for the change in ozone levels. Recently, a reduced form model (RFM) has been proposed to approximate the complex model as a simple function of a few relevant inputs. In this paper, we develop a new statistical approach to make full use of the RFM to study the effects of various control strategies on the probability and magnitude of extreme ozone events. We fuse the model output with monitoring data to calibrate the RFM by modeling the conditional distribution of monitoring data given the RFM using a combination of flexible semiparametric quantile regression for the center of the distribution where data are abundant and a parametric extreme value distribution for the tail where data are sparse. Selected parameters in the conditional distribution are allowed to vary by the RFM value and the spatial location. Also, due to the simplicity of the RFM, we are able to embed the RFM in our Bayesian hierarchical framework to obtain a full posterior for the model input parameters, and propagate this uncertainty to the estimation of the effects of the control strategies. We use the new framework to evaluate three potential control strategies, and find that reducing mobile-source emissions has a larger impact than reducing point-source emissions or a combination of several emission sources.
△ Less
Submitted 6 December, 2013;
originally announced December 2013.
-
Approximating the conditional density given large observed values via a multivariate extremes framework, with application to environmental data
Authors:
Daniel Cooley,
Richard A. Davis,
Philippe Naveau
Abstract:
Phenomena such as air pollution levels are of greatest interest when observations are large, but standard prediction methods are not specifically designed for large observations. We propose a method, rooted in extreme value theory, which approximates the conditional distribution of an unobserved component of a random vector given large observed values. Specifically, for…
▽ More
Phenomena such as air pollution levels are of greatest interest when observations are large, but standard prediction methods are not specifically designed for large observations. We propose a method, rooted in extreme value theory, which approximates the conditional distribution of an unobserved component of a random vector given large observed values. Specifically, for $\mathbf{Z}=(Z_1,...,Z_d)^T$ and $\mathbf{Z}_{-d}=(Z_1,...,Z_{d-1})^T$, the method approximates the conditional distribution of $[Z_d|\mathbf{Z}_{-d}=\mathbf{z}_{-d}]$ when $|\mathbf{z}_{-d}|>r_*$. The approach is based on the assumption that $\mathbf{Z}$ is a multivariate regularly varying random vector of dimension $d$. The conditional distribution approximation relies on knowledge of the angular measure of $\mathbf{Z}$, which provides explicit structure for dependence in the distribution's tail. As the method produces a predictive distribution rather than just a point predictor, one can answer any question posed about the quantity being predicted, and, in particular, one can assess how well the extreme behavior is represented. Using a fitted model for the angular measure, we apply our method to nitrogen dioxide measurements in metropolitan Washington DC. We obtain a predictive distribution for the air pollutant at a location given the air pollutant's measurements at four nearby locations and given that the norm of the vector of the observed measurements is large.
△ Less
Submitted 8 January, 2013;
originally announced January 2013.
-
Discussion of "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet
Authors:
D. Cooley,
S. R. Sain
Abstract:
Discussion of "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet [arXiv:1208.3378].
Discussion of "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet [arXiv:1208.3378].
△ Less
Submitted 17 August, 2012;
originally announced August 2012.
-
Downscaling extremes: A comparison of extreme value distributions in point-source and gridded precipitation data
Authors:
Elizabeth C. Mannshardt-Shamseldin,
Richard L. Smith,
Stephan R. Sain,
Linda O. Mearns,
Daniel Cooley
Abstract:
There is substantial empirical and climatological evidence that precipitation extremes have become more extreme during the twentieth century, and that this trend is likely to continue as global warming becomes more intense. However, understanding these issues is limited by a fundamental issue of spatial scaling: most evidence of past trends comes from rain gauge data, whereas trends into the futur…
▽ More
There is substantial empirical and climatological evidence that precipitation extremes have become more extreme during the twentieth century, and that this trend is likely to continue as global warming becomes more intense. However, understanding these issues is limited by a fundamental issue of spatial scaling: most evidence of past trends comes from rain gauge data, whereas trends into the future are produced by climate models, which rely on gridded aggregates. To study this further, we fit the Generalized Extreme Value (GEV) distribution to the right tail of the distribution of both rain gauge and gridded events. The results of this modeling exercise confirm that return values computed from rain gauge data are typically higher than those computed from gridded data; however, the size of the difference is somewhat surprising, with the rain gauge data exhibiting return values sometimes two or three times that of the gridded data. The main contribution of this paper is the development of a family of regression relationships between the two sets of return values that also take spatial variations into account. Based on these results, we now believe it is possible to project future changes in precipitation extremes at the point-location level based on results from climate models.
△ Less
Submitted 8 October, 2010;
originally announced October 2010.
-
Bayesian Inference from Composite Likelihoods, with an Application to Spatial Extremes
Authors:
Mathieu Ribatet,
Daniel Cooley,
Anthony C. Davison
Abstract:
Composite likelihoods are increasingly used in applications where the full likelihood is analytically unknown or computationally prohibitive. Although the maximum composite likelihood estimator has frequentist properties akin to those of the usual maximum likelihood estimator, Bayesian inference based on composite likelihoods has yet to be explored. In this paper we investigate the use of the Metr…
▽ More
Composite likelihoods are increasingly used in applications where the full likelihood is analytically unknown or computationally prohibitive. Although the maximum composite likelihood estimator has frequentist properties akin to those of the usual maximum likelihood estimator, Bayesian inference based on composite likelihoods has yet to be explored. In this paper we investigate the use of the Metropolis--Hastings algorithm to compute a pseudo-posterior distribution based on the composite likelihood. Two methodologies for adjusting the algorithm are presented and their performance on approximating the true posterior distribution is investigated using simulated data sets and real data on spatial extremes of rainfall.
△ Less
Submitted 6 July, 2011; v1 submitted 27 November, 2009;
originally announced November 2009.