-
Representative endowments and uniform Gini orderings of multi-attribute welfare
Authors:
Karl Mosler
Abstract:
For the comparison of inequality and welfare in multiple attributes the use of generalized Gini indices is proposed. Individual endowment vectors are summarized by using attribute weights and aggregated in a spectral social evaluation function. Such functions are based on classes of spectral functions, ordered by their aversion to inequality. Given a spectrum and a set $P$ of attribute weights, a…
▽ More
For the comparison of inequality and welfare in multiple attributes the use of generalized Gini indices is proposed. Individual endowment vectors are summarized by using attribute weights and aggregated in a spectral social evaluation function. Such functions are based on classes of spectral functions, ordered by their aversion to inequality. Given a spectrum and a set $P$ of attribute weights, a multivariate Gini dominance ordering, being uniform in weights, is defined. If the endowment vectors are comonotonic, the dominance is determined by their marginal distributions; if not, the dependence structure of the endowment distribution has to be taken into account. For this, a set-valued representative endowment is introduced that characterizes the welfare of a $d$-dimensioned distribution. It consists of all points above the lower border of a convex compact in $\R^d$, while the set ordering of representative endowments corresponds to uniform Gini dominance. An application is given to the welfare of 28 European countries. Properties of $P$-uniform Gini dominance are derived, including relations to other orderings of $d$-variate distributions such as convex and dependence orderings. The multi-dimensioned representative endowment can be efficiently calculated from data. In a sampling context, it consistently estimates its population version.
△ Less
Submitted 1 September, 2022; v1 submitted 31 March, 2021;
originally announced March 2021.
-
Choosing among notions of multivariate depth statistics
Authors:
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, con…
▽ More
Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in R-packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed.
△ Less
Submitted 5 May, 2021; v1 submitted 4 April, 2020;
originally announced April 2020.
-
Classification with the pot-pot plot
Authors:
Oleksii Pokotylo,
Karl Mosler
Abstract:
We propose a procedure for supervised classification that is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials. Separation of the classes, as well as classification of new data poi…
▽ More
We propose a procedure for supervised classification that is based on potential functions. The potential of a class is defined as a kernel density estimate multiplied by the class's prior probability. The method transforms the data to a potential-potential (pot-pot) plot, where each data point is mapped to a vector of potentials. Separation of the classes, as well as classification of new data points, is performed on this plot. For this, either the $α$-procedure ($α$-P) or $k$-nearest neighbors ($k$-NN) are employed. For data that are generated from continuous distributions, these classifiers prove to be strongly Bayes-consistent. The potentials depend on the kernel and its bandwidth used in the density estimate. We investigate several variants of bandwidth selection, including joint and separate pre-scaling and a bandwidth regression approach. The new method is applied to benchmark data from the literature, including simulated data sets as well as 50 sets of real data. It compares favorably to known classification methods such as LDA, QDA, max kernel density estimates, $k$-NN, and $DD$-plot classification using depth functions.
△ Less
Submitted 9 August, 2016;
originally announced August 2016.
-
Computing the Oja Median in R: The Package OjaNP
Authors:
Daniel Fischer,
Karl Mosler,
Jyrki Möttönen,
Klaus Nordhausen,
Oleksii Pokotylo,
Daniel Vogel
Abstract:
The Oja median is one of several extensions of the univariate median to the multivariate case. It has many nice properties, but is computationally demanding. In this paper, we first review the properties of the Oja median and compare it to other multivariate medians. Afterwards we discuss four algorithms to compute the Oja median, which are implemented in our R-package OjaNP. Besides these algorit…
▽ More
The Oja median is one of several extensions of the univariate median to the multivariate case. It has many nice properties, but is computationally demanding. In this paper, we first review the properties of the Oja median and compare it to other multivariate medians. Afterwards we discuss four algorithms to compute the Oja median, which are implemented in our R-package OjaNP. Besides these algorithms, the package contains also functions to compute Oja signs, Oja signed ranks, Oja ranks, and the related scatter concepts. To illustrate their use, the corresponding multivariate one- and $C$-sample location tests are implemented.
△ Less
Submitted 24 June, 2016;
originally announced June 2016.
-
Fast computation of Tukey trimmed regions and median in dimension $p>2$
Authors:
Xiaohui Liu,
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
Given data in $\mathbb{R}^{p}$, a Tukey $κ$-trimmed region is the set of all points that have at least Tukey depth $κ$ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient c…
▽ More
Given data in $\mathbb{R}^{p}$, a Tukey $κ$-trimmed region is the set of all points that have at least Tukey depth $κ$ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension $p > 2$. We construct two novel algorithms to compute a Tukey $κ$-trimmed region, a naïve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the naïve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median.
△ Less
Submitted 8 November, 2018; v1 submitted 16 December, 2014;
originally announced December 2014.
-
Classifying real-world data with the $DDα$-procedure
Authors:
Pavlo Mozharovskyi,
Karl Mosler,
Tatjana Lange
Abstract:
The $DDα$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $α$-procedure. To each d…
▽ More
The $DDα$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $α$-procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions' shape best and is most robust, `outsiders', that is data points having zero depth in all classes, need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $DDα$-procedure is available as an R-package.
△ Less
Submitted 28 October, 2015; v1 submitted 19 July, 2014;
originally announced July 2014.
-
Fast DD-classification of functional data
Authors:
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the $DD$-plot, which is a subset of the unit hypercube. T…
▽ More
A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the $DD$-plot, which is a subset of the unit hypercube. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification on $[0,1]^q$. The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik-Cervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as a benchmark study.
△ Less
Submitted 28 January, 2016; v1 submitted 5 March, 2014;
originally announced March 2014.
-
Stochastic linear programming with a distortion risk constraint
Authors:
Karl Mosler,
Pavel Bazovkin
Abstract:
Linear optimization problems are investigated whose parameters are uncertain. We apply coherent distortion risk measures to capture the possible violation of a restriction. Each risk constraint induces an uncertainty set of coefficients, which is shown to be a weighted-mean trimmed region. Given an external sample of the coefficients, an uncertainty set is a convex polytope that can be exactly cal…
▽ More
Linear optimization problems are investigated whose parameters are uncertain. We apply coherent distortion risk measures to capture the possible violation of a restriction. Each risk constraint induces an uncertainty set of coefficients, which is shown to be a weighted-mean trimmed region. Given an external sample of the coefficients, an uncertainty set is a convex polytope that can be exactly calculated. We construct an efficient geometrical algorithm to solve stochastic linear programs that have a single distortion risk constraint. The algorithm is available as an R-package. Also the algorithm's asymptotic behavior is investigated, when the sample is i.i.d. from a general probability distribution. Finally, we present some computational experience.
△ Less
Submitted 10 August, 2012;
originally announced August 2012.
-
General notions of depth for functional data
Authors:
Karl Mosler,
Yulia Polyakova
Abstract:
A data depth measures the centrality of a point with respect to an empirical distribution. Postulates are formulated, which a depth for functional data should satisfy, and a general approach is proposed to construct multivariate data depths in Banach spaces. The new approach, mentioned as Phi-depth, is based on depth infima over a proper set Phi of R^d-valued linear functions. Several desirable pr…
▽ More
A data depth measures the centrality of a point with respect to an empirical distribution. Postulates are formulated, which a depth for functional data should satisfy, and a general approach is proposed to construct multivariate data depths in Banach spaces. The new approach, mentioned as Phi-depth, is based on depth infima over a proper set Phi of R^d-valued linear functions. Several desirable properties are established for the Phi-depth and a generalized version of it. The general notions include many new depths as special cases. In particular a location-slope depth and a principal component depth are introduced.
△ Less
Submitted 30 January, 2018; v1 submitted 9 August, 2012;
originally announced August 2012.
-
Fast nonparametric classification based on data depth
Authors:
Tatjana Lange,
Karl Mosler,
Pavlo Mozharovskyi
Abstract:
A new procedure, called DDa-procedure, is developed to solve the problem of classifying d-dimensional objects into q >= 2 classes. The procedure is completely nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]^q. Specifically, the depth is the zonoid depth, and the algorithm is the alpha-procedure. In case of more th…
▽ More
A new procedure, called DDa-procedure, is developed to solve the problem of classifying d-dimensional objects into q >= 2 classes. The procedure is completely nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0,1]^q. Specifically, the depth is the zonoid depth, and the algorithm is the alpha-procedure. In case of more than two classes several binary classifications are performed and a majority rule is applied. Special treatments are discussed for 'outsiders', that is, data having zero depth vector. The DDa-classifier is applied to simulated as well as real data, and the results are compared with those of similar procedures that have been recently proposed. In most cases the new procedure has comparable error rates, but is much faster than other classification approaches, including the SVM.
△ Less
Submitted 17 December, 2012; v1 submitted 20 July, 2012;
originally announced July 2012.