-
Optimal Estimation of Slope Vector in High-dimensional Linear Transformation Model
Authors:
Xin Lu Tan
Abstract:
In a linear transformation model, there exists an unknown monotone nonlinear transformation function such that the transformed response variable and the predictor variables satisfy a linear regression model. In this paper, we present CENet, a new method for estimating the slope vector and simultaneously performing variable selection in the high-dimensional sparse linear transformation model. CENet…
▽ More
In a linear transformation model, there exists an unknown monotone nonlinear transformation function such that the transformed response variable and the predictor variables satisfy a linear regression model. In this paper, we present CENet, a new method for estimating the slope vector and simultaneously performing variable selection in the high-dimensional sparse linear transformation model. CENet is the solution to a convex optimization problem and can be computed efficiently from an algorithm with guaranteed convergence to the global optimum. We show that under a pairwise elliptical distribution assumption on each predictor-transformed-response pair and some regularity conditions, CENet attains the same optimal rate of convergence as the best regression method in the high-dimensional sparse linear regression model. To the best of our limited knowledge, this is the first such result in the literature. We demonstrate the empirical performance of CENet on both simulated and real datasets. We also discuss the connection of CENet with some nonlinear regression/multivariate methods proposed in the literature.
△ Less
Submitted 24 April, 2016;
originally announced April 2016.
-
High-dimensional robust precision matrix estimation: Cellwise corruption under $ε$-contamination
Authors:
Po-Ling Loh,
Xin Lu Tan
Abstract:
We analyze the statistical consistency of robust estimators for precision matrices in high dimensions. We focus on a contamination mechanism acting cellwise on the data matrix. The estimators we analyze are formed by plugging appropriately chosen robust covariance matrix estimators into the graphical Lasso and CLIME. Such estimators were recently proposed in the robust statistics literature, but o…
▽ More
We analyze the statistical consistency of robust estimators for precision matrices in high dimensions. We focus on a contamination mechanism acting cellwise on the data matrix. The estimators we analyze are formed by plugging appropriately chosen robust covariance matrix estimators into the graphical Lasso and CLIME. Such estimators were recently proposed in the robust statistics literature, but only analyzed mathematically from the point of view of the breakdown point. This paper provides complementary high-dimensional error bounds for the precision matrix estimators that reveal the interplay between the dimensionality of the problem and the degree of contamination permitted in the observed distribution. We also show that although the graphical Lasso and CLIME estimators perform equally well from the point of view of statistical consistency, the breakdown property of the graphical Lasso is superior to that of CLIME. We discuss implications of our work for problems involving graphical model estimation when the uncontaminated data follow a multivariate normal distribution, and the goal is to estimate the support of the population-level precision matrix. Our error bounds do not make any assumptions about the the contaminating distribution and allow for a nonvanishing fraction of cellwise contamination.
△ Less
Submitted 23 September, 2015;
originally announced September 2015.
-
Optimal Estimation of A Quadratic Functional and Detection of Simultaneous Signals
Authors:
T. Tony Cai,
Xin Lu Tan
Abstract:
Motivated by applications in genomics, this paper studies the problem of optimal estimation of a quadratic functional of two normal mean vectors, $Q(μ, θ) = \frac{1}{n}\sum_{i=1}^nμ_i^2θ_i^2$, with a particular focus on the case where both mean vectors are sparse. We propose optimal estimators of $Q(μ, θ)$ for different regimes and establish the minimax rates of convergence over a family of parame…
▽ More
Motivated by applications in genomics, this paper studies the problem of optimal estimation of a quadratic functional of two normal mean vectors, $Q(μ, θ) = \frac{1}{n}\sum_{i=1}^nμ_i^2θ_i^2$, with a particular focus on the case where both mean vectors are sparse. We propose optimal estimators of $Q(μ, θ)$ for different regimes and establish the minimax rates of convergence over a family of parameter spaces. The optimal rates exhibit interesting phase transitions in this family. The simultaneous signal detection problem is also considered under the minimax framework. It is shown that the proposed estimators for $Q(μ, θ)$ naturally lead to optimal testing procedures.
△ Less
Submitted 7 May, 2015;
originally announced May 2015.