-
Concentration behavior of the penalized least squares estimator
Authors:
Alan Muro,
Sara van de Geer
Abstract:
Consider the standard nonparametric regression model and take as estimator the penalized least squares function. In this article, we study the trade-off between closeness to the true function and complexity penalization of the estimator, where complexity is described by a seminorm on a class of functions. First, we present an exponential concentration inequality revealing the concentration behavio…
▽ More
Consider the standard nonparametric regression model and take as estimator the penalized least squares function. In this article, we study the trade-off between closeness to the true function and complexity penalization of the estimator, where complexity is described by a seminorm on a class of functions. First, we present an exponential concentration inequality revealing the concentration behavior of the trade-off of the penalized least squares estimator around a nonrandom quantity, where such quantity depends on the problem under consideration. Then, under some conditions and for the proper choice of the tuning parameter, we obtain bounds for this nonrandom quantity. We illustrate our results with some examples that include the smoothing splines estimator.
△ Less
Submitted 19 October, 2016; v1 submitted 27 November, 2015;
originally announced November 2015.
-
The additive model with different smoothness for the components
Authors:
Sara van de Geer,
Alan Muro
Abstract:
We consider an additive regression model consisting of two components $f^0$ and $g^0$, where the first component $f^0$ is in some sense "smoother" than the second $g^0$. Smoothness is here described in terms of a semi-norm on the class of regression functions. We use a penalized least squares estimator $(\hat f, \hat g)$ of $(f^0, g^0)$ and show that the rate of convergence for $\hat f $ is faster…
▽ More
We consider an additive regression model consisting of two components $f^0$ and $g^0$, where the first component $f^0$ is in some sense "smoother" than the second $g^0$. Smoothness is here described in terms of a semi-norm on the class of regression functions. We use a penalized least squares estimator $(\hat f, \hat g)$ of $(f^0, g^0)$ and show that the rate of convergence for $\hat f $ is faster than the rate of convergence for $\hat g$. In fact, both rates are generally as fast as in the case where one of the two components is known. The theory is illustrated by a simulation study. Our proofs rely on recent results from empirical process theory.
△ Less
Submitted 26 May, 2014;
originally announced May 2014.
-
On higher order isotropy conditions and lower bounds for sparse quadratic forms
Authors:
Sara van de Geer,
Alan Muro
Abstract:
This study aims at contributing to lower bounds for empirical compatibility constants or empirical restricted eigenvalues. This is of importance in compressed sensing and theory for $\ell_1$-regularized estimators. Let $X$ be an $n \times p$ data matrix with rows being independent copies of a $p$-dimensional random variable. Let $\hat Σ:= X^T X / n$ be the inner product matrix. We show that the qu…
▽ More
This study aims at contributing to lower bounds for empirical compatibility constants or empirical restricted eigenvalues. This is of importance in compressed sensing and theory for $\ell_1$-regularized estimators. Let $X$ be an $n \times p$ data matrix with rows being independent copies of a $p$-dimensional random variable. Let $\hat Σ:= X^T X / n$ be the inner product matrix. We show that the quadratic forms $u^T \hat Σu$ are lower bounded by a value converging to one, uniformly over the set of vectors $u$ with $u^T Σ_0 u $ equal to one and $\ell_1$-norm at most $M$. Here $Σ_0 := {\bf E} \hat Σ$ is the theoretical inner product matrix which we assume to exist. The constant $M$ is required to be of small order $\sqrt {n / \log p}$. We assume moreover $m$-th order isotropy for some $m >2$ and sub-exponential tails or moments up to order $\log p$ for the entries in $X$. As a consequence we obtain convergence of the empirical compatibility constant to its theoretical counterpart, and similarly for the empirical restricted eigenvalue. If the data matrix $X$ is first normalized so that its columns all have equal length we obtain lower bounds assuming only isotropy and no further moment conditions on its entries. The isotropy condition is shown to hold for certain martingale situations.
△ Less
Submitted 9 November, 2014; v1 submitted 23 May, 2014;
originally announced May 2014.