-
Limiting Behavior of Maxima under Dependence
Authors:
Klaus Herrmann,
Marius Hofert,
Johanna G. Neslehova
Abstract:
Weak convergence of maxima of dependent sequences of identically distributed continuous random variables is studied under normalizing sequences arising as subsequences of the normalizing sequences from an associated iid sequence. This general framework allows one to derive several generalizations of the well-known Fisher-Tippett-Gnedenko theorem under conditions on the univariate marginal distribu…
▽ More
Weak convergence of maxima of dependent sequences of identically distributed continuous random variables is studied under normalizing sequences arising as subsequences of the normalizing sequences from an associated iid sequence. This general framework allows one to derive several generalizations of the well-known Fisher-Tippett-Gnedenko theorem under conditions on the univariate marginal distribution and the dependence structure of the sequence. The limiting distributions are shown to be compositions of a generalized extreme value distribution and a distortion function which reflects the limiting behavior of the diagonal of the underlying copula. Uniform convergence rates for the weak convergence to the limiting distribution are also derived. Examples covering well-known dependence structures are provided. Several existing results, e.g. for exchangeable sequences or stationary time series, are embedded in the proposed framework.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Exploratory functional data analysis of multivariate densities for the identification of agricultural soil contamination by risk elements
Authors:
Tomáš Matys Grygar,
Una Radojičić,
Ivana Pavlů,
Sonja Greven,
Johanna Genest Nešlehová,
Štěpánka Tůmová,
Karel Hron
Abstract:
Geochemical mapping of risk element concentrations in soils is performed in countries around the world. It results in large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data whil…
▽ More
Geochemical mapping of risk element concentrations in soils is performed in countries around the world. It results in large datasets of high analytical quality, which can be used to identify soils that violate individual legislative limits for safe food production. However, there is a lack of advanced data mining tools that would be suitable for sensitive exploratory data analysis of big data while respecting the natural variability of soil composition. To distinguish anthropogenic contamination from natural variation, the analysis of the entire data distributions for smaller sub-areas is key. In this article, we propose a new data mining method for geochemical mapping data based on functional data analysis of probability density functions in the framework of Bayes spaces after post-stratification of a big dataset to smaller districts. Proposed tools allow us to analyse the entire distribution, going beyond a superficial detection of extreme concentration anomalies. We illustrate the proposed methodology on a dataset gathered according to the Czech national legislation (1990--2009). Taking into account specific properties of probability density functions and recent results for orthogonal decomposition of multivariate densities enabled us to reveal real contamination patterns that were so far only suspected in Czech agricultural soils. We process the above Czech soil composition dataset by first compartmentalising it into spatial units, in particular the districts, and by subsequently clustering these districts according to diagnostic features of their uni- and multivariate distributions at high concentration ends. Comparison between compartments is key to the reliable distinction of diffuse contamination. In this work, we used soil contamination by Cu-bearing pesticides as an example for empirical testing of the proposed data mining approach.
△ Less
Submitted 6 November, 2023; v1 submitted 20 October, 2023;
originally announced October 2023.
-
Clustered Archimax Copulas
Authors:
Simon Chatelain,
Samuel Perreault,
Johanna G. Nešlehová,
Anne-Laure Fougères
Abstract:
When modeling multivariate phenomena, properly capturing the joint extremal behavior is often one of the many concerns. Archimax copulas appear as successful candidates in case of asymptotic dependence. In this paper, the class of Archimax copulas is extended via their stochastic representation to a clustered construction. These clustered Archimax copulas are characterized by a partition of the ra…
▽ More
When modeling multivariate phenomena, properly capturing the joint extremal behavior is often one of the many concerns. Archimax copulas appear as successful candidates in case of asymptotic dependence. In this paper, the class of Archimax copulas is extended via their stochastic representation to a clustered construction. These clustered Archimax copulas are characterized by a partition of the random variables into groups linked by a radial copula; each cluster is Archimax and therefore defined by its own Archimedean generator and stable tail dependence function. The proposed extension allows for both asymptotic dependence and independence between the clusters, a property which is sought, for example, in applications in environmental sciences and finance. The model also inherits from the ability of Archimax copulas to capture dependence between variables at pre-extreme levels. The asymptotic behavior of the model is established, leading to a rich class of stable tail dependence functions.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
Orthogonal decomposition of multivariate densities in Bayes spaces and its connection with copulas
Authors:
Christian Genest,
Karel Hron,
Johanna G. Nešlehová
Abstract:
Bayes spaces were initially designed to provide a geometric framework for the modeling and analysis of distributional data. It has recently come to light that this methodology can be exploited to provide an orthogonal decomposition of bivariate probability distributions into an independent and an interaction part. In this paper, new insights into these results are provided by reformulating them us…
▽ More
Bayes spaces were initially designed to provide a geometric framework for the modeling and analysis of distributional data. It has recently come to light that this methodology can be exploited to provide an orthogonal decomposition of bivariate probability distributions into an independent and an interaction part. In this paper, new insights into these results are provided by reformulating them using Hilbert space theory and a multivariate extension is developed using a distributional analog of the Hoeffding-Sobol identity. A connection between the resulting decomposition of a multivariate density and its copula-based representation is also provided.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Causal Inference for Quantile Treatment Effects
Authors:
Shuo Sun,
Erica E. M. Moodie,
Johanna G. Nešlehová
Abstract:
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted Q…
▽ More
Analyses of environmental phenomena often are concerned with understanding unlikely events such as floods, heatwaves, droughts or high concentrations of pollutants. Yet the majority of the causal inference literature has focused on modelling means, rather than (possibly high) quantiles. We define a general estimator of the population quantile treatment (or exposure) effects (QTE) -- the weighted QTE (WQTE) -- of which the population QTE is a special case, along with a general class of balancing weights incorporating the propensity score. Asymptotic properties of the proposed WQTE estimators are derived. We further propose and compare propensity score regression and two weighted methods based on these balancing weights to understand the causal effect of an exposure on quantiles, allowing for the exposure to be binary, discrete or continuous. Finite sample behavior of the three estimators is studied in simulation. The proposed methods are applied to data taken from the Bavarian Danube catchment area to estimate the 95% QTE of phosphorus on copper concentration in the river.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
On attainability of Kendall's tau matrices and concordance signatures
Authors:
Alexander J. McNeil,
Johanna G. Neslehova,
Andrew D. Smith
Abstract:
Methods are developed for checking and completing systems of bivariate and multivariate Kendall's tau concordance measures in applications where only partial information about dependencies between variables is available. The concept of a concordance signature of a multivariate continuous distribution is introduced; this is the vector of concordance probabilities for margins of all orders. It is sh…
▽ More
Methods are developed for checking and completing systems of bivariate and multivariate Kendall's tau concordance measures in applications where only partial information about dependencies between variables is available. The concept of a concordance signature of a multivariate continuous distribution is introduced; this is the vector of concordance probabilities for margins of all orders. It is shown that every attainable concordance signature is equal to the concordance signature of a unique mixture of the extremal copulas, that is the copulas with extremal correlation matrices consisting exclusively of 1's and -1's. A method of estimating an attainable concordance signature from data is derived and shown to correspond to using standard estimates of Kendall's tau in the absence of ties. The set of attainable Kendall rank correlation matrices of multivariate continuous distributions is proved to be identical to the set of convex combinations of extremal correlation matrices, a set known as the cut polytope. A methodology for testing the attainability of concordance signatures using linear optimization and convex analysis is provided. The elliptical copulas are shown to yield a strict subset of the attainable concordance signatures as well as a strict subset of the attainable Kendall rank correlation matrices; the Student t copula is seen to converge, as the degrees of freedom tend to zero, to a mixture of extremal copulas sharing its concordance signature with all elliptical distributions that have the same correlation matrix. A characterization of the attainable signatures of equiconcordant copulas is given.
△ Less
Submitted 11 May, 2022; v1 submitted 17 September, 2020;
originally announced September 2020.
-
Detection of Block-Exchangeable Structure in Large-Scale Correlation Matrices
Authors:
Samuel Perreault,
Thierry Duchesne,
Johanna G. Nešlehová
Abstract:
Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics.…
▽ More
Correlation matrices are omnipresent in multivariate data analysis. When the number d of variables is large, the sample estimates of correlation matrices are typically noisy and conceal underlying dependence patterns. We consider the case when the variables can be grouped into K clusters with exchangeable dependence; this assumption is often made in applications, e.g., in finance and econometrics. Under this partial exchangeability condition, the corresponding correlation matrix has a block structure and the number of unknown parameters is reduced from d(d-1)/2 to at most K(K+1)/2. We propose a robust algorithm based on Kendall's rank correlation to identify the clusters without assuming the knowledge of K a priori or anything about the margins except continuity. The corresponding block-structured estimator performs considerably better than the sample Kendall rank correlation matrix when K < d. The new estimator can also be much more efficient in finite samples even in the unstructured case K = d, although there is no gain asymptotically. When the distribution of the data is elliptical, the results extend to linear correlation matrices and their inverses. The procedure is illustrated on financial stock returns.
△ Less
Submitted 24 October, 2018; v1 submitted 19 June, 2017;
originally announced June 2017.
-
Extremal attractors of Liouville copulas
Authors:
Léo R. Belzile,
Johanna G. Nešlehová
Abstract:
Liouville copulas, which were introduced in McNeil and Neslehova (2010), are asymmetric generalizations of the ubiquitous Archimedean copula class. They are the dependence structures of scale mixtures of Dirichlet distributions, also called Liouville distributions. In this paper, the limiting extreme-value copulas of Liouville copulas and of their survival counterparts are derived. The limiting ma…
▽ More
Liouville copulas, which were introduced in McNeil and Neslehova (2010), are asymmetric generalizations of the ubiquitous Archimedean copula class. They are the dependence structures of scale mixtures of Dirichlet distributions, also called Liouville distributions. In this paper, the limiting extreme-value copulas of Liouville copulas and of their survival counterparts are derived. The limiting max-stable models, termed here the scaled extremal Dirichlet, are new and encompass several existing classes of multivariate max-stable distributions, including the logistic, negative logistic and extremal Dirichlet. As shown herein, the stable tail dependence function and angular density of the scaled extremal Dirichlet model have a tractable form, which in turn leads to a simple de Haan representation. The latter is used to design efficient algorithms for unconditional simulation based on the work of Dombry, Engelke and Oesting (2015) and to derive tractable formulas for maximum-likelihood inference. The scaled extremal Dirichlet model is illustrated on river flow data of the river Isar in southern Germany.
△ Less
Submitted 6 July, 2017; v1 submitted 11 April, 2017;
originally announced April 2017.
-
On the empirical multilinear copula process for count data
Authors:
Christian Genest,
Johanna G. Nešlehová,
Bruno Rémillard
Abstract:
Continuation refers to the operation by which the cumulative distribution function of a discontinuous random vector is made continuous through multilinear interpolation. The copula that results from the application of this technique to the classical empirical copula is either called the multilinear or the checkerboard copula. As shown by Genest and Nešlehová (Astin Bull. 37 (2007) 475-515) and Neš…
▽ More
Continuation refers to the operation by which the cumulative distribution function of a discontinuous random vector is made continuous through multilinear interpolation. The copula that results from the application of this technique to the classical empirical copula is either called the multilinear or the checkerboard copula. As shown by Genest and Nešlehová (Astin Bull. 37 (2007) 475-515) and Nešlehová (J. Multivariate Anal. 98 (2007) 544-567), this copula plays a central role in characterizing dependence concepts in discrete random vectors. In this paper, the authors establish the asymptotic behavior of the empirical process associated with the multilinear copula based on $d$-variate count data. This empirical process does not generally converge in law on the space $\mathcal {C}([0,1]^d)$ of continuous functions on $[0,1]^d$, equipped with the uniform norm. However, the authors show that the process converges in $\mathcal{C}(K)$ for any compact $K\subset\mathcal{O}$, where $\mathcal{O}$ is a dense open subset of $[0,1]^d$, whose complement is the Cartesian product of the ranges of the marginal distribution functions. This result is sufficient to deduce the weak limit of many functionals of the process, including classical statistics for monotone trend. It also leads to a powerful and consistent test of independence which is applicable even to sparse contingency tables whose dimension is sample size dependent.
△ Less
Submitted 4 July, 2014;
originally announced July 2014.