-
Exact first moments of the RV coefficient by invariant orthogonal integration
Authors:
François Bavaud
Abstract:
The RV coefficient measures the similarity between two multivariate configurations, and its significance testing has attracted various proposals in the last decades. We present a new approach, the invariant orthogonal integration, permitting to obtain the exact first four moments of the RV coefficient under the null hypothesis. It consists in averaging along the Haar measure the respective orienta…
▽ More
The RV coefficient measures the similarity between two multivariate configurations, and its significance testing has attracted various proposals in the last decades. We present a new approach, the invariant orthogonal integration, permitting to obtain the exact first four moments of the RV coefficient under the null hypothesis. It consists in averaging along the Haar measure the respective orientations of the two configurations, and can be applied to any multivariate setting endowed with Euclidean distances between the observations. Our proposal also covers the weighted setting of observations of unequal importance, where the exchangeability assumption, justifying the usual permutation tests, breaks down.
The proposed RV moments express as simple functions of the kernel eigenvalues occurring in the weighted multidimensional scaling of the two configurations. The expressions for the third and fourth moments seem original. The first three moments can be obtained by elementary means, but computing the fourth moment requires a more sophisticated apparatus, the Weingarten calculus for orthogonal groups. The central role of standard kernels and their spectral moments is emphasized.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Robust Estimation through Schoenberg transformations
Authors:
François Bavaud
Abstract:
Schoenberg transformations, mapping Euclidean configurations into Euclidean configurations, define in turn a transformed inertia, whose minimization produces robust location estimates. The procedure only depends upon Euclidean distances between observations, and applies equivalently to univariate and multivariate data. The choice of the family of transformations and their parameters defines a flex…
▽ More
Schoenberg transformations, mapping Euclidean configurations into Euclidean configurations, define in turn a transformed inertia, whose minimization produces robust location estimates. The procedure only depends upon Euclidean distances between observations, and applies equivalently to univariate and multivariate data. The choice of the family of transformations and their parameters defines a flexible location strategy, generalizing M-estimators. Two regimes of solutions are identified. Theoretical results on their existence and stability are provided, and illustrated on two data sets.
△ Less
Submitted 21 February, 2011;
originally announced February 2011.
-
Relative Entropy and Statistics
Authors:
François Bavaud
Abstract:
Formalising the confrontation of opinions (models) to observations (data) is the task of Inferential Statistics. Information Theory provides us with a basic functional, the relative entropy (or Kullback-Leibler divergence), an asymmetrical measure of dissimilarity between the empirical and the theoretical distributions. The formal properties of the relative entropy turn out to be able to capture…
▽ More
Formalising the confrontation of opinions (models) to observations (data) is the task of Inferential Statistics. Information Theory provides us with a basic functional, the relative entropy (or Kullback-Leibler divergence), an asymmetrical measure of dissimilarity between the empirical and the theoretical distributions. The formal properties of the relative entropy turn out to be able to capture every aspect of Inferential Statistics, as illustrated here, for simplicity, on dices (= i.i.d. process with finitely many outcomes): refutability (strict or probabilistic): the asymmetry data / models; small deviations: rejecting a single hypothesis; competition between hypotheses and model selection; maximum likelihood: model inference and its limits; maximum entropy: reconstructing partially observed data; EM-algorithm; flow data and gravity modelling; determining the order of a Markov chain.
△ Less
Submitted 3 April, 2010; v1 submitted 29 August, 2008;
originally announced August 2008.