-
Selective inference using randomized group lasso estimators for general models
Authors:
Yiling Huang,
Sarah Pirenne,
Snigdha Panigrahi,
Gerda Claeskens
Abstract:
Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimizat…
▽ More
Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.
△ Less
Submitted 26 March, 2024; v1 submitted 23 June, 2023;
originally announced June 2023.
-
Detangling robustness in high dimensions: composite versus model-averaged estimation
Authors:
Jing Zhou,
Gerda Claeskens,
Jelena Bradic
Abstract:
Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between model-averaged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that en…
▽ More
Robust methods, though ubiquitous in practice, are yet to be fully understood in the context of regularized estimation and high dimensions. Even simple questions become challenging very quickly. For example, classical statistical theory identifies equivalence between model-averaged and composite quantile estimation. However, little to nothing is known about such equivalence between methods that encourage sparsity. This paper provides a toolbox to further study robustness in these settings and focuses on prediction. In particular, we study optimally weighted model-averaged as well as composite $l_1$-regularized estimation. Optimal weights are determined by minimizing the asymptotic mean squared error. This approach incorporates the effects of regularization, without the assumption of perfect selection, as is often used in practice. Such weights are then optimal for prediction quality. Through an extensive simulation study, we show that no single method systematically outperforms others. We find, however, that model-averaged and composite quantile estimators often outperform least-squares methods, even in the case of Gaussian model noise. Real data application witnesses the method's practical use through the reconstruction of compressed audio signals.
△ Less
Submitted 12 June, 2020;
originally announced June 2020.
-
Fixed effects testing in high-dimensional linear mixed models
Authors:
Jelena Bradic,
Gerda Claeskens,
Thomas Gueuning
Abstract:
Many scientific and engineering challenges -- ranging from pharmacokinetic drug dosage allocation and personalized medicine to marketing mix (4Ps) recommendations -- require an understanding of the unobserved heterogeneity in order to develop the best decision making-processes. In this paper, we develop a hypothesis test and the corresponding p-value for testing for the significance of the homogen…
▽ More
Many scientific and engineering challenges -- ranging from pharmacokinetic drug dosage allocation and personalized medicine to marketing mix (4Ps) recommendations -- require an understanding of the unobserved heterogeneity in order to develop the best decision making-processes. In this paper, we develop a hypothesis test and the corresponding p-value for testing for the significance of the homogeneous structure in linear mixed models. A robust matching moment construction is used for creating a test that adapts to the size of the model sparsity. When unobserved heterogeneity at a cluster level is constant, we show that our test is both consistent and unbiased even when the dimension of the model is extremely high. Our theoretical results rely on a new family of adaptive sparse estimators of the fixed effects that do not require consistent estimation of the random effects. Moreover, our inference results do not require consistent model selection. We showcase that moment matching can be extended to nonlinear mixed effects models and to generalized linear mixed effects models. In numerical and real data experiments, we find that the developed method is extremely accurate, that it adapts to the size of the underlying model and is decidedly powerful in the presence of irrelevant covariates.
△ Less
Submitted 14 August, 2017;
originally announced August 2017.
-
Bayesian-motivated tests of function fit and their asymptotic frequentist properties
Authors:
Marc Aerts,
Gerda Claeskens,
Jeffrey D. Hart
Abstract:
We propose and analyze nonparametric tests of the null hypothesis that a function belongs to a specified parametric family. The tests are based on BIC approximations, π_{BIC}, to the posterior probability of the null model, and may be carried out in either Bayesian or frequentist fashion. We obtain results on the asymptotic distribution of π_{BIC} under both the null hypothesis and local alterna…
▽ More
We propose and analyze nonparametric tests of the null hypothesis that a function belongs to a specified parametric family. The tests are based on BIC approximations, π_{BIC}, to the posterior probability of the null model, and may be carried out in either Bayesian or frequentist fashion. We obtain results on the asymptotic distribution of π_{BIC} under both the null hypothesis and local alternatives. One version of π_{BIC}, call it π_{BIC}^*, uses a class of models that are orthogonal to each other and growing in number without bound as sample size, n, tends to infinity. We show that \sqrtn(1-π_{BIC}^*) converges in distribution to a stable law under the null hypothesis. We also show that π_{BIC}^* can detect local alternatives converging to the null at the rate \sqrt\log n/n. A particularly interesting finding is that the power of the π_{BIC}^*-based test is asymptotically equal to that of a test based on the maximum of alternative log-likelihoods. Simulation results and an example involving variable star data illustrate desirable features of the proposed tests.
△ Less
Submitted 30 August, 2005;
originally announced August 2005.