KOO approach for scalable variable selection problem in large-dimensional regression
Authors:
Zhidong Bai,
Kwok Pui Choi,
Yasunori Fujikoshi,
Jiang Hu
Abstract:
An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit t…
▽ More
An important issue in many multivariate regression problems is to eliminate candidate predictors with null predictor vectors. In large-dimensional (LD) setting where the numbers of responses and predictors are large, model selection encounters the scalability challenge. Knock-one-out (KOO) statistics hold promise to meet this challenge. In this paper, the almost sure limits and the central limit theorem of the KOO statistics are derived under the LD setting and mild distributional assumptions (finite fourth moments) of the errors. These theoretical results guarantee the strong consistency of a subset selection rule based on the KOO statistics with a general threshold. For enhancing the robustness of the selection rule, we also propose a bootstrap threshold for the KOO approach. Simulation results support our conclusions and demonstrate the selection probabilities by the KOO approach with the bootstrap threshold outperform the methods using Akaike information threshold, Bayesian information threshold and Mallow's C$_p$ threshold. We compare the proposed KOO approach with those based on information threshold to a chemometrics dataset and a yeast cell-cycle dataset, which suggests our proposed method identifies useful models.
△ Less
Submitted 25 April, 2023; v1 submitted 30 March, 2023;
originally announced March 2023.
Strong consistency of the AIC, BIC, $C_p$ and KOO methods in high-dimensional multivariate linear regression
Authors:
Zhidong Bai,
Yasunori Fujikoshi,
Jiang Hu
Abstract:
Variable selection is essential for improving inference and interpretation in multivariate linear regression. Although a number of alternative regressor selection criteria have been suggested, the most prominent and widely used are the Akaike information criterion (AIC), Bayesian information criterion (BIC), Mallow's $C_p$, and their modifications. However, for high-dimensional data, experience ha…
▽ More
Variable selection is essential for improving inference and interpretation in multivariate linear regression. Although a number of alternative regressor selection criteria have been suggested, the most prominent and widely used are the Akaike information criterion (AIC), Bayesian information criterion (BIC), Mallow's $C_p$, and their modifications. However, for high-dimensional data, experience has shown that the performance of these classical criteria is not always satisfactory. In the present article, we begin by presenting the necessary and sufficient conditions (NSC) for the strong consistency of the high-dimensional AIC, BIC, and $C_p$, based on which we can identify some reasons for their poor performance. Specifically, we show that under certain mild high-dimensional conditions, if the BIC is strongly consistent, then the AIC is strongly consistent, but not vice versa. This result contradicts the classical understanding. In addition, we consider some NSC for the strong consistency of the high-dimensional kick-one-out (KOO) methods introduced by Zhao et al. (1986) and Nishii et al. (1988). Furthermore, we propose two general methods based on the KOO methods and prove their strong consistency. The proposed general methods remove the penalties while simultaneously reducing the conditions for the dimensions and sizes of the regressors. A simulation study supports our consistency conclusions and shows that the convergence rates of the two proposed general KOO methods are much faster than those of the original methods.
△ Less
Submitted 5 January, 2020; v1 submitted 30 October, 2018;
originally announced October 2018.
Non-asymptotic results for Cornish--Fisher expansions
Authors:
V. V. Ulyanov,
M. Aoshima,
Y. Fujikoshi
Abstract:
We get the computable error bounds for generalized Cornish-Fisher expansions for quantiles of statistics provided that the computable error bounds for Edgeworth-Chebyshev type expansions for distributions of these statistics are known. The results are illustrated by examples.
We get the computable error bounds for generalized Cornish-Fisher expansions for quantiles of statistics provided that the computable error bounds for Edgeworth-Chebyshev type expansions for distributions of these statistics are known. The results are illustrated by examples.
△ Less
Submitted 2 April, 2016;
originally announced April 2016.