-
Dimension estimation in PCA model using high-dimensional data augmentation
Authors:
Una Radojicic,
Joni Virta
Abstract:
We propose a modified, high-dimensional version of a recent dimension estimation procedure that determines the dimension via the introduction of augmented noise variables into the data. Our asymptotic results show that the proposal is consistent in wide high-dimensional scenarios, and further shed light on why the original method breaks down when the dimension of either the data or the augmentatio…
▽ More
We propose a modified, high-dimensional version of a recent dimension estimation procedure that determines the dimension via the introduction of augmented noise variables into the data. Our asymptotic results show that the proposal is consistent in wide high-dimensional scenarios, and further shed light on why the original method breaks down when the dimension of either the data or the augmentation becomes too large. Simulations are used to demonstrate the superiority of the proposal to competitors both under and outside of the theoretical model.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
The Asymptotic Properties of the One-Sample Spatial Rank Methods
Authors:
Jyrki Möttönen,
Klaus Nordhausen,
Hannu Oja,
Una Radojicic
Abstract:
For a set of $p$-variate data points $\boldsymbol y_1,\ldots,\boldsymbol y_n$, there are several versions of multivariate median and related multivariate sign test proposed and studied in the literature. In this paper we consider the asymptotic properties of the multivariate extension of the Hodges-Lehmann (HL) estimator, the spatial HL-estimator, and the related test statistic. The asymptotic beh…
▽ More
For a set of $p$-variate data points $\boldsymbol y_1,\ldots,\boldsymbol y_n$, there are several versions of multivariate median and related multivariate sign test proposed and studied in the literature. In this paper we consider the asymptotic properties of the multivariate extension of the Hodges-Lehmann (HL) estimator, the spatial HL-estimator, and the related test statistic. The asymptotic behavior of the spatial HL-estimator and the related test statistic when $n$ tends to infinity are collected, reviewed, and proved, some for the first time though being used already for a longer time. We also derive the limiting behavior of the HL-estimator when both the sample size $n$ and the dimension $p$ tend to infinity.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Order Determination for Tensor-valued Observations Using Data Augmentation
Authors:
Una Radojicic,
Niko Lietzen,
Klaus Nordhausen,
Joni Virta
Abstract:
Tensor-valued data benefits greatly from dimension reduction as the reduction in size is exponential in the number of modes. To achieve maximal reduction without loss in information, our objective in this work is to give an automated procedure for the optimal selection of the reduced dimensionality. Our approach combines a recently proposed data augmentation procedure with the higher-order singula…
▽ More
Tensor-valued data benefits greatly from dimension reduction as the reduction in size is exponential in the number of modes. To achieve maximal reduction without loss in information, our objective in this work is to give an automated procedure for the optimal selection of the reduced dimensionality. Our approach combines a recently proposed data augmentation procedure with the higher-order singular value decomposition (HOSVD) in a tensorially natural way. We give theoretical guidelines on how to choose the tuning parameters and further inspect their influence in a simulation study. As our primary result, we show that the procedure consistently estimates the true latent dimensions under a noisy tensor model, both at the population and sample levels. Additionally, we propose a bootstrap-based alternative to the augmentation estimator. Simulations are used to demonstrate the estimation accuracy of the two methods under various settings.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Kurtosis-based projection pursuit for matrix-valued data
Authors:
Una Radojicic,
Klaus Nordhausen,
Joni Virta
Abstract:
We develop projection pursuit for data that admit a natural representation in matrix form. For projection indices, we propose extensions of the classical kurtosis and Mardia's multivariate kurtosis. The first index estimates projections for both sides of the matrices simultaneously, while the second index finds the two projections separately. Both indices are shown to recover the optimally separat…
▽ More
We develop projection pursuit for data that admit a natural representation in matrix form. For projection indices, we propose extensions of the classical kurtosis and Mardia's multivariate kurtosis. The first index estimates projections for both sides of the matrices simultaneously, while the second index finds the two projections separately. Both indices are shown to recover the optimally separating projection for two-group Gaussian mixtures in the full absence of any label information. We further establish the strong consistency of the corresponding sample estimators. Simulations and a real data example on hand-written postal code data are used to demonstrate the method.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Large-Sample Properties of Blind Estimation of the Linear Discriminant Using Projection Pursuit
Authors:
Una Radojicic,
Klaus Nordhausen,
Joni Virta
Abstract:
We study the estimation of the linear discriminant with projection pursuit, a method that is blind in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis and their convex combination. The results show that in each…
▽ More
We study the estimation of the linear discriminant with projection pursuit, a method that is blind in the sense that it does not use the class labels in the estimation. Our viewpoint is asymptotic and, as our main contribution, we derive central limit theorems for estimators based on three different projection indices, skewness, kurtosis and their convex combination. The results show that in each case the limiting covariance matrix is proportional to that of linear discriminant analysis (LDA), an unblind estimator of the discriminant. An extensive comparative study between the asymptotic variances reveals that projection pursuit is able to achieve efficiency equal to LDA when the groups are arbitrarily well-separated and their sizes are reasonably balanced. We conclude with a real data example and a simulation study investigating the validity of the obtained asymptotic formulas for finite samples.
△ Less
Submitted 8 March, 2021;
originally announced March 2021.
-
Notion of information and independent component analysis
Authors:
Una Radojicic,
Klaus Nordhausen,
Hannu Oja
Abstract:
Partial orderings and measures of information for continuous univariate random variables with special roles of Gaussian and uniform distributions are discussed. The information measures and measures of non-Gaussianity including third and fourth cumulants are generally used as projection indices in the projection pursuit approach for the independent component analysis. The connections between infor…
▽ More
Partial orderings and measures of information for continuous univariate random variables with special roles of Gaussian and uniform distributions are discussed. The information measures and measures of non-Gaussianity including third and fourth cumulants are generally used as projection indices in the projection pursuit approach for the independent component analysis. The connections between information, non-Gaussianity and statistical independence in the context of independent component analysis is discussed in detail.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.