Search | arXiv e-print repository

arXiv:2105.02083 [pdf, other]

AdaBoost and robust one-bit compressed sensing

Authors: Geoffrey Chinot, Felix Kuchelmeister, Matthias Löffler, Sara van de Geer

Abstract: This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed fea… ▽ More This paper studies binary classification in robust one-bit compressed sensing with adversarial errors. It is assumed that the model is overparameterized and that the parameter of interest is effectively sparse. AdaBoost is considered, and, through its relation to the max-$\ell_1$-margin-classifier, prediction error bounds are derived. The developed theory is general and allows for heavy-tailed feature distributions, requiring only a weak moment assumption and an anti-concentration condition. Improved convergence rates are shown when the features satisfy a small deviation lower bound. In particular, the results provide an explanation why interpolating adversarial noise can be harmless for classification problems. Simulations illustrate the presented theory. △ Less

Submitted 8 December, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: 40 pages, 4 figures, code available at https://github.com/Felix-127/Adaboost-and-robust-one-bit-compressed-sensing, extended results to features that satisfy weak-moment and anti-concentration assumption

MSC Class: 62H30 (Primary); 94A12 (Secondary)

arXiv:2012.00807 [pdf, ps, other]

On the robustness of minimum norm interpolators and regularized empirical risk minimizers

Authors: Geoffrey Chinot, Matthias Löffler, Sara van de Geer

Abstract: This article develops a general theory for minimum norm interpolating estimators and regularized empirical risk minimizers (RERM) in linear models in the presence of additive, potentially adversarial, errors. In particular, no conditions on the errors are imposed. A quantitative bound for the prediction error is given, relating it to the Rademacher complexity of the covariates, the norm of the min… ▽ More This article develops a general theory for minimum norm interpolating estimators and regularized empirical risk minimizers (RERM) in linear models in the presence of additive, potentially adversarial, errors. In particular, no conditions on the errors are imposed. A quantitative bound for the prediction error is given, relating it to the Rademacher complexity of the covariates, the norm of the minimum norm interpolator of the errors and the size of the subdifferential around the true parameter. The general theory is illustrated for Gaussian features and several norms: The $\ell_1$, $\ell_2$, group Lasso and nuclear norms. In case of sparsity or low-rank inducing norms, minimum norm interpolators and RERM yield a prediction error of the order of the average noise level, provided that the overparameterization is at least a logarithmic factor larger than the number of samples and that, in case of RERM, the regularization parameter is small enough. Lower bounds that show near optimality of the results complement the analysis. △ Less

Submitted 7 October, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

Comments: 35 pages

MSC Class: 62J05

arXiv:1911.07231 [pdf, other]

Adaptive Rates for Total Variation Image Denoising

Authors: Francesco Ortelli, Sara van de Geer

Abstract: We study the theoretical properties of image denoising via total variation penalized least-squares. We define the total vatiation in terms of the two-dimensional total discrete derivative of the image and show that it gives rise to denoised images that are piecewise constant on rectangular sets. We prove that, if the true image is piecewise constant on just a few rectangular sets, the denoised ima… ▽ More We study the theoretical properties of image denoising via total variation penalized least-squares. We define the total vatiation in terms of the two-dimensional total discrete derivative of the image and show that it gives rise to denoised images that are piecewise constant on rectangular sets. We prove that, if the true image is piecewise constant on just a few rectangular sets, the denoised image converges to the true image at a parametric rate, up to a log factor. More generally, we show that the denoised image enjoys oracle properties, that is, it is almost as good as if some aspects of the true image were known. In other words, image denoising with total variation regularization leads to an adaptive reconstruction of the true image. △ Less

Submitted 26 January, 2021; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: 38 pages, 6 figures

Journal ref: Journal of Machine Learning Research, 21(247), 2020

arXiv:1904.10871 [pdf, ps, other]

Prediction bounds for higher order total variation regularized least squares

Authors: Francesco Ortelli, Sara van de Geer

Abstract: We establish adaptive results for trend filtering: least squares estimation with a penalty on the total variation of $(k-1)^{\rm th}$ order differences. Our approach is based on combining a general oracle inequality for the $\ell_1$-penalized least squares estimator with "interpolating vectors" to upper-bound the "effective sparsity". This allows one to show that the $\ell_1$-penalty on the… ▽ More We establish adaptive results for trend filtering: least squares estimation with a penalty on the total variation of $(k-1)^{\rm th}$ order differences. Our approach is based on combining a general oracle inequality for the $\ell_1$-penalized least squares estimator with "interpolating vectors" to upper-bound the "effective sparsity". This allows one to show that the $\ell_1$-penalty on the $k^{\text{th}}$ order differences leads to an estimator that can adapt to the number of jumps in the $(k-1)^{\text{th}}$ order differences of the underlying signal or an approximation thereof. We show the result for $k \in \{1,2,3,4\}$ and indicate how it could be derived for general $k\in \mathbb{N}$. △ Less

Submitted 17 July, 2020; v1 submitted 24 April, 2019; originally announced April 2019.

Comments: 28 pages

MSC Class: 62J07

arXiv:1902.11192 [pdf, ps, other]

doi 10.1093/imaiai/iaaa002

Oracle inequalities for square root analysis estimators with application to total variation penalties

Authors: Francesco Ortelli, Sara van de Geer

Abstract: Through the direct study of the analysis estimator we derive oracle inequalities with fast and slow rates by adapting the arguments involving projections by Dalalyan, Hebiri and Lederer (2017). We then extend the theory to the square root analysis estimator. Finally, we focus on (square root) total variation regularized estimators on graphs and obtain constant-friendly rates, which, up to log-term… ▽ More Through the direct study of the analysis estimator we derive oracle inequalities with fast and slow rates by adapting the arguments involving projections by Dalalyan, Hebiri and Lederer (2017). We then extend the theory to the square root analysis estimator. Finally, we focus on (square root) total variation regularized estimators on graphs and obtain constant-friendly rates, which, up to log-terms, match previous results obtained by entropy calculations. We also obtain an oracle inequality for the (square root) total variation regularized estimator over the cycle graph. △ Less

Submitted 14 December, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

Journal ref: Information and Inference: A Journal of the IMA, iaaa002, 2020

arXiv:1811.10443 [pdf, other]

Sparse spectral estimation with missing and corrupted measurements

Authors: Andreas Elsener, Sara van de Geer

Abstract: Supervised learning methods with missing data have been extensively studied not just due to the techniques related to low-rank matrix completion. Also in unsupervised learning one often relies on imputation methods. As a matter of fact, missing values induce a bias in various estimators such as the sample covariance matrix. In the present paper, a convex method for sparse subspace estimation is ex… ▽ More Supervised learning methods with missing data have been extensively studied not just due to the techniques related to low-rank matrix completion. Also in unsupervised learning one often relies on imputation methods. As a matter of fact, missing values induce a bias in various estimators such as the sample covariance matrix. In the present paper, a convex method for sparse subspace estimation is extended to the case of missing and corrupted measurements. This is done by correcting the bias instead of imputing the missing values. The estimator is then used as an initial value for a nonconvex procedure to improve the overall statistical performance. The methodological as well as theoretical frameworks are applied to a wide range of statistical problems. These include sparse Principal Component Analysis with different types of randomly missing data and the estimation of eigenvectors of low-rank matrices with missing values. Finally, the statistical performance is demonstrated on synthetic data. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: 32 pages, 4 figures

arXiv:1806.01918 [pdf, other]

A Framework for the construction of upper bounds on the number of affine linear regions of ReLU feed-forward neural networks

Authors: Peter Hinz, Sara van de Geer

Abstract: We present a framework to derive upper bounds on the number of regions that feed-forward neural networks with ReLU activation functions are affine linear on. It is based on an inductive analysis that keeps track of the number of such regions per dimensionality of their images within the layers. More precisely, the information about the number regions per dimensionality is pushed through the layers… ▽ More We present a framework to derive upper bounds on the number of regions that feed-forward neural networks with ReLU activation functions are affine linear on. It is based on an inductive analysis that keeps track of the number of such regions per dimensionality of their images within the layers. More precisely, the information about the number regions per dimensionality is pushed through the layers starting with one region of the input dimension of the neural network and using a recursion based on an analysis of how many regions per output dimensionality a subsequent layer with a certain width can induce on an input region with a given dimensionality. The final bound on the number of regions depends on the number and widths of the layers of the neural network and on some additional parameters that were used for the recursion. It is stated in terms of the $L1$-norm of the last column of a product of matrices and provides a unifying treatment of several previously known bounds: Depending on the choice of the recursion parameters that determine these matrices, it is possible to obtain the bounds from Montúfar (2014), (2017) and Serra et. al. (2017) as special cases. For the latter, which is the strongest of these bounds, the formulation in terms of matrices provides new insight. In particular, by using explicit formulas for a Jordan-like decomposition of the involved matrices, we achieve new tighter results for the asymptotic setting, where the number of layers of the same fixed width tends to infinity. △ Less

Submitted 9 March, 2020; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1806.01009 [pdf, ps, other]

doi 10.1214/18-EJS1519

On the total variation regularized estimator over a class of tree graphs

Authors: Francesco Ortelli, Sara van de Geer

Abstract: We generalize to tree graphs obtained by connecting path graphs an oracle result obtained for the Fused Lasso over the path graph. Moreover we show that it is possible to substitute in the oracle inequality the minimum of the distances between jumps by their harmonic mean. In doing so we prove a lower bound on the compatibility constant for the total variation penalty. Our analysis leverages insig… ▽ More We generalize to tree graphs obtained by connecting path graphs an oracle result obtained for the Fused Lasso over the path graph. Moreover we show that it is possible to substitute in the oracle inequality the minimum of the distances between jumps by their harmonic mean. In doing so we prove a lower bound on the compatibility constant for the total variation penalty. Our analysis leverages insights obtained for the path graph with one branch to understand the case of more general tree graphs. As a side result, we get insights into the irrepresentable condition for such tree graphs. △ Less

Submitted 16 June, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

Comments: 42 pages

Journal ref: Electronic Journal of Statistics, 12, 2018, 4517-4570

arXiv:1801.10567 [pdf, other]

De-biased sparse PCA: Inference and testing for eigenstructure of large covariance matrices

Authors: Jana Janková, Sara van de Geer

Abstract: Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings… ▽ More Sparse principal component analysis (sPCA) has become one of the most widely used techniques for dimensionality reduction in high-dimensional datasets. The main challenge underlying sPCA is to estimate the first vector of loadings of the population covariance matrix, provided that only a certain number of loadings are non-zero. In this paper, we propose confidence intervals for individual loadings and for the largest eigenvalue of the population covariance matrix. Given an independent sample $X^i \in\mathbb R^p, i = 1,...,n,$ generated from an unknown distribution with an unknown covariance matrix $Σ_0$, our aim is to estimate the first vector of loadings and the largest eigenvalue of $Σ_0$ in a setting where $p\gg n$. Next to the high-dimensionality, another challenge lies in the inherent non-convexity of the problem. We base our methodology on a Lasso-penalized M-estimator which, despite non-convexity, may be solved by a polynomial-time algorithm such as coordinate or gradient descent. We show that our estimator achieves the minimax optimal rates in $\ell_1$ and $\ell_2$-norm. We identify the bias in the Lasso-based estimator and propose a de-biased sparse PCA estimator for the vector of loadings and for the largest eigenvalue of the covariance matrix $Σ_0$. Our main results provide theoretical guarantees for asymptotic normality of the de-biased estimator. The major conditions we impose are sparsity in the first eigenvector of small order $\sqrt{n}/\log p$ and sparsity of the same order in the columns of the inverse Hessian matrix of the population risk. △ Less

Submitted 31 January, 2018; originally announced January 2018.

Comments: 41 pages

arXiv:1801.08512 [pdf, other]

Inference in high-dimensional graphical models

Authors: Jana Jankova, Sara van de Geer

Abstract: We provide a selected overview of methodology and theory for estimation and inference on the edge weights in high-dimensional directed and undirected Gaussian graphical models. For undirected graphical models, two main explicit constructions are provided: one based on a global method that maximizes the joint likelihood (the graphical Lasso) and one based on a local (nodewise) method that sequentia… ▽ More We provide a selected overview of methodology and theory for estimation and inference on the edge weights in high-dimensional directed and undirected Gaussian graphical models. For undirected graphical models, two main explicit constructions are provided: one based on a global method that maximizes the joint likelihood (the graphical Lasso) and one based on a local (nodewise) method that sequentially applies the Lasso to estimate the neighbourhood of each node. The proposed estimators lead to confidence intervals for edge weights and recovery of the edge structure. We evaluate their empirical performance in an extensive simulation study. The theoretical guarantees for the methods are achieved under a sparsity condition relative to the sample size and regularity conditions. For directed acyclic graphs, we apply similar ideas to construct confidence intervals for edge weights, when the directed acyclic graph is identifiable. △ Less

Submitted 25 January, 2018; originally announced January 2018.

Comments: 29 pages

arXiv:1706.09231 [pdf, other]

doi 10.1109/TSP.2018.2807399

Asymptotic Confidence Regions for High-dimensional Structured Sparsity

Authors: Benjamin Stucky, Sara van de Geer

Abstract: In the setting of high-dimensional linear regression models, we propose two frameworks for constructing pointwise and group confidence sets for penalized estimators which incorporate prior knowledge about the organization of the non-zero coefficients. This is done by desparsifying the estimator as in van de Geer et al. [18] and van de Geer and Stucky [17], then using an appropriate estimator for t… ▽ More In the setting of high-dimensional linear regression models, we propose two frameworks for constructing pointwise and group confidence sets for penalized estimators which incorporate prior knowledge about the organization of the non-zero coefficients. This is done by desparsifying the estimator as in van de Geer et al. [18] and van de Geer and Stucky [17], then using an appropriate estimator for the precision matrix $Θ$. In order to estimate the precision matrix a corresponding structured matrix norm penalty has to be introduced. After normalization the result is an asymptotic pivot. The asymptotic behavior is studied and simulations are added to study the differences between the two schemes. △ Less

Submitted 28 June, 2017; originally announced June 2017.

Comments: 28 pages, 4 figures, 1 table

arXiv:1610.01353 [pdf, other]

Confidence regions for high-dimensional generalized linear models under sparsity

Authors: Jana Janková, Sara van de Geer

Abstract: We study asymptotically normal estimation and confidence regions for low-dimensional parameters in high-dimensional sparse models. Our approach is based on the $\ell_1$-penalized M-estimator which is used for construction of a bias corrected estimator. We show that the proposed estimator is asymptotically normal, under a sparsity assumption on the high-dimensional parameter, smoothness conditions… ▽ More We study asymptotically normal estimation and confidence regions for low-dimensional parameters in high-dimensional sparse models. Our approach is based on the $\ell_1$-penalized M-estimator which is used for construction of a bias corrected estimator. We show that the proposed estimator is asymptotically normal, under a sparsity assumption on the high-dimensional parameter, smoothness conditions on the expected loss and an entropy condition. This leads to uniformly valid confidence regions and hypothesis testing for low-dimensional parameters. The present approach is different in that it allows for treatment of loss functions that we not sufficiently differentiable, such as quantile loss, Huber loss or hinge loss functions. We also provide new results for estimation of the inverse Fisher information matrix, which is necessary for the construction of the proposed estimator. We formulate our results for general models under high-level conditions, but investigate these conditions in detail for generalized linear models and provide mild sufficient conditions. As particular examples, we investigate the case of quantile loss and Huber loss in linear regression and demonstrate the performance of the estimators in a simulation study and on real datasets from genome-wide association studies. We further investigate the case of logistic regression and illustrate the performance of the estimator on simulated and real data. △ Less

Submitted 5 October, 2016; originally announced October 2016.

Comments: 40 pages

arXiv:1601.00815 [pdf, ps, other]

Semi-parametric efficiency bounds for high-dimensional models

Authors: Jana Jankova, Sara van de Geer

Abstract: Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper we propose a framework for obtaining semi-parametric efficiency bounds for sparse high-dimensional models, where the dimension of the parameter is larger than the sample size. We adopt a semi-parametric point of view: we concentrate on one dimensional functions of a high… ▽ More Asymptotic lower bounds for estimation play a fundamental role in assessing the quality of statistical procedures. In this paper we propose a framework for obtaining semi-parametric efficiency bounds for sparse high-dimensional models, where the dimension of the parameter is larger than the sample size. We adopt a semi-parametric point of view: we concentrate on one dimensional functions of a high-dimensional parameter. We follow two different approaches to reach the lower bounds: asymptotic Cramér-Rao bounds and Le Cam's type of analysis. Both these approaches allow us to define a class of asymptotically unbiased or "regular" estimators for which a lower bound is derived. Consequently, we show that certain estimators obtained by de-sparsifying (or de-biasing) an $\ell_1$-penalized M-estimator are asymptotically unbiased and achieve the lower bound on the variance: thus in this sense they are asymptotically efficient. The paper discusses in detail the linear regression model and the Gaussian graphical model. △ Less

Submitted 12 October, 2017; v1 submitted 5 January, 2016; originally announced January 2016.

Comments: 68 pages

arXiv:1503.06426 [pdf, other]

doi 10.1214/15-EJS1041

High-dimensional inference in misspecified linear models

Authors: Peter Bühlmann, Sara van de Geer

Abstract: We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sa… ▽ More We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our results contribute to robustness considerations with respect to model misspecification. △ Less

Submitted 22 March, 2015; originally announced March 2015.

Comments: 24 pages, 4 figures

MSC Class: 62J05; 62J07

Journal ref: Electronic Journal of Statistics 2015, Vol. 9, 1449-1473

arXiv:1403.6752 [pdf, other]

doi 10.1214/15-EJS1031

Confidence intervals for high-dimensional inverse covariance estimation

Authors: Jana Jankova, Sara van de Geer

Abstract: We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision matrix whose entries have a Gaussian limiting distribution. Asymptotic properties of the novel estimator are analyzed for the case of sub-Gaussian observations under a sparsity assumption on the entri… ▽ More We propose methodology for statistical inference for low-dimensional parameters of sparse precision matrices in a high-dimensional setting. Our method leads to a non-sparse estimator of the precision matrix whose entries have a Gaussian limiting distribution. Asymptotic properties of the novel estimator are analyzed for the case of sub-Gaussian observations under a sparsity assumption on the entries of the true precision matrix and regularity conditions. Thresholding the de-sparsified estimator gives guarantees for edge selection in the associated graphical model. Performance of the proposed method is illustrated in a simulation study. △ Less

Submitted 11 August, 2015; v1 submitted 26 March, 2014; originally announced March 2014.

Comments: 26 pages

Journal ref: Electronic Journal of Statistics 2015, Vol. 9, No. 1, 1205 - 1229

arXiv:1307.1067 [pdf, other]

The partial linear model in high dimensions

Authors: Patric Müller, Sara van de Geer

Abstract: Partial linear models have been widely used as flexible method for modelling linear components in conjunction with non-parametric ones. Despite the presence of the non-parametric part, the linear, parametric part can under certain conditions be estimated with parametric rate. In this paper, we consider a high-dimensional linear part. We show that it can be estimated with oracle rates, using the LA… ▽ More Partial linear models have been widely used as flexible method for modelling linear components in conjunction with non-parametric ones. Despite the presence of the non-parametric part, the linear, parametric part can under certain conditions be estimated with parametric rate. In this paper, we consider a high-dimensional linear part. We show that it can be estimated with oracle rates, using the LASSO penalty for the linear part and a smoothness penalty for the nonparametric part. △ Less

Submitted 3 July, 2013; originally announced July 2013.

Comments: 48 pages, 16 figures, submitted to Scandinavian Journal of Statistics

arXiv:1209.5908 [pdf, other]

doi 10.1016/j.jspi.2013.05.019

Correlated variables in regression: clustering and sparse estimation

Authors: Peter Bühlmann, Philipp Rütimann, Sara van de Geer, Cun-Hui Zhang

Abstract: We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlat… ▽ More We consider estimation in a high-dimensional linear model with strongly correlated variables. We propose to cluster the variables first and do subsequent sparse estimation such as the Lasso for cluster-representatives or the group Lasso based on the structure from the clusters. Regarding the first step, we present a novel and bottom-up agglomerative clustering algorithm based on canonical correlations, and we show that it finds an optimal solution and is statistically consistent. We also present some theoretical arguments that canonical correlation based clustering leads to a better-posed compatibility constant for the design matrix which ensures identifiability and an oracle inequality for the group Lasso. Furthermore, we discuss circumstances where cluster-representatives and using the Lasso as subsequent estimator leads to improved results for prediction and detection of variables. We complement the theoretical analysis with various empirical results. △ Less

Submitted 26 September, 2012; originally announced September 2012.

Comments: 40 pages, 6 figures

MSC Class: 62J07; 62H30

Journal ref: Journal of Statistical Planning and Inference 2013, Vol. 143, 1835-1858

arXiv:1206.6721 [pdf, ps, other]

doi 10.1214/12-STS397

Quasi-Likelihood and/or Robust Estimation in High Dimensions

Authors: Sara van de Geer, Patric Müller

Abstract: We consider the theory for the high-dimensional generalized linear model with the Lasso. After a short review on theoretical results in literature, we present an extension of the oracle results to the case of quasi-likelihood loss. We prove bounds for the prediction error and $\ell_1$-error. The results are derived under fourth moment conditions on the error distribution. The case of robust loss i… ▽ More We consider the theory for the high-dimensional generalized linear model with the Lasso. After a short review on theoretical results in literature, we present an extension of the oracle results to the case of quasi-likelihood loss. We prove bounds for the prediction error and $\ell_1$-error. The results are derived under fourth moment conditions on the error distribution. The case of robust loss is also given. We moreover show that under an irrepresentable condition, the $\ell_1$-penalized quasi-likelihood estimator has no false positives. △ Less

Submitted 4 January, 2013; v1 submitted 28 June, 2012; originally announced June 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-STS397 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS397

Journal ref: Statistical Science 2012, Vol. 27, No. 4, 469-480

arXiv:1202.6046 [pdf, ps, other]

doi 10.1007/s11749-010-0197-z

L1-Penalization for Mixture Regression Models

Authors: Nicolas Städler, Peter Bühlmann, Sara van de Geer

Abstract: We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an l1-penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itse… ▽ More We consider a finite mixture of regressions (FMR) model for high-dimensional inhomogeneous data where the number of covariates may be much larger than sample size. We propose an l1-penalized maximum likelihood estimator in an appropriate parameterization. This kind of estimation belongs to a class of problems where optimization and theory for non-convex functions is needed. This distinguishes itself very clearly from high-dimensional estimation with convex loss- or objective functions, as for example with the Lasso in linear or generalized linear models. Mixture models represent a prime and important example where non-convexity arises. For FMR models, we develop an efficient EM algorithm for numerical optimization with provable convergence properties. Our penalized estimator is numerically better posed (e.g., boundedness of the criterion function) than unpenalized maximum likelihood estimation, and it allows for effective statistical regularization including variable selection. We also present some asymptotic theory and oracle inequalities: due to non-convexity of the negative log-likelihood function, different mathematical arguments are needed than for problems with convex losses. Finally, we apply the new method to both simulated and real data. △ Less

Submitted 27 February, 2012; originally announced February 2012.

Comments: This is the author's version of the work (published as a discussion paper in TEST, 2010, Volume 19, 209--285). The final publication is available at http://www.springerlink.com

Journal ref: TEST, 2010, Volume 19, 209--285

arXiv:1107.0189 [pdf, other]

The Lasso, correlated design, and improved oracle inequalities

Authors: Sara van de Geer, Johannes Lederer

Abstract: We study high-dimensional linear models and the $\ell_1$-penalized least squares estimator, also known as the Lasso estimator. In literature, oracle inequalities have been derived under restricted eigenvalue or compatibility conditions. In this paper, we complement this with entropy conditions which allow one to improve the dual norm bound, and demonstrate how this leads to new oracle inequalities… ▽ More We study high-dimensional linear models and the $\ell_1$-penalized least squares estimator, also known as the Lasso estimator. In literature, oracle inequalities have been derived under restricted eigenvalue or compatibility conditions. In this paper, we complement this with entropy conditions which allow one to improve the dual norm bound, and demonstrate how this leads to new oracle inequalities. The new oracle inequalities show that a smaller choice for the tuning parameter and a trade-off between $\ell_1$-norms and small compatibility constants are possible. This implies, in particular for correlated design, improved bounds for the prediction error of the Lasso estimator as compared to the methods based on restricted eigenvalue or compatibility conditions only. △ Less

Submitted 1 July, 2011; originally announced July 2011.

Comments: 18 pages, 3 figures

MSC Class: 62J05

arXiv:1002.3784 [pdf, ps, other]

doi 10.1111/j.1467-9469.2011.00740.x

Estimation for High-Dimensional Linear Mixed-Effects Models Using $\ell_1$-Penalization

Authors: Jürg Schelldorfer, Peter Bühlmann, Sara van de Geer

Abstract: We propose an $\ell_1$-penalized estimation procedure for high-dimensional linear mixed-effects models. The models are useful whenever there is a grouping structure among high-dimensional observations, i.e. for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the me… ▽ More We propose an $\ell_1$-penalized estimation procedure for high-dimensional linear mixed-effects models. The models are useful whenever there is a grouping structure among high-dimensional observations, i.e. for clustered data. We prove a consistency and an oracle optimality result and we develop an algorithm with provable numerical convergence. Furthermore, we demonstrate the performance of the method on simulated and a real high-dimensional data set. △ Less

Submitted 25 November, 2010; v1 submitted 19 February, 2010; originally announced February 2010.

Journal ref: Scandinavian Journal of Statistics 2011, 38: 197-214

arXiv:0910.0722 [pdf, other]

doi 10.1214/09-EJS506

On the conditions used to prove oracle results for the Lasso

Authors: Sara A. van de Geer, Peter Bühlmann

Abstract: Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient fo… ▽ More Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence (Bunea et al, 2007b,c) or restricted isometry (Candes and Tao, 2005) assumptions. △ Less

Submitted 5 October, 2009; originally announced October 2009.

Comments: 33 pages, 1 figure

Journal ref: Electronic Journal of Statistics, 3, (2009), 1360-1392

arXiv:0903.2515 [pdf, ps, other]

Adaptive Lasso for High Dimensional Regression and Gaussian Graphical Modeling

Authors: Shuheng Zhou, Sara van de Geer, Peter Bühlmann

Abstract: We show that the two-stage adaptive Lasso procedure (Zou, 2006) is consistent for high-dimensional model selection in linear and Gaussian graphical models. Our conditions for consistency cover more general situations than those accomplished in previous work: we prove that restricted eigenvalue conditions (Bickel et al., 2008) are also sufficient for sparse structure estimation. We show that the two-stage adaptive Lasso procedure (Zou, 2006) is consistent for high-dimensional model selection in linear and Gaussian graphical models. Our conditions for consistency cover more general situations than those accomplished in previous work: we prove that restricted eigenvalue conditions (Bickel et al., 2008) are also sufficient for sparse structure estimation. △ Less

Submitted 13 March, 2009; originally announced March 2009.

Comments: 30 pages

arXiv:0903.1468 [pdf, ps, other]

Taking Advantage of Sparsity in Multi-Task Learning

Authors: Karim Lounici, Massimiliano Pontil, Alexandre B. Tsybakov, Sara van de Geer

Abstract: We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to… ▽ More We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in the multi-task learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite. △ Less

Submitted 8 March, 2009; originally announced March 2009.

Journal ref: 10 pages, 1 figure, Proc. Computational Learning Theory Conference (COLT 2009)

arXiv:0806.4115 [pdf, ps, other]

doi 10.1214/09-AOS692

High-dimensional additive modeling

Authors: Lukas Meier, Sara van de Geer, Peter Bühlmann

Abstract: We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results… ▽ More We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains. △ Less

Submitted 18 November, 2009; v1 submitted 25 June, 2008; originally announced June 2008.

Comments: Published in at http://dx.doi.org/10.1214/09-AOS692 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS692 MSC Class: 62G08; 62F12 (Primary); 62J07 (Secondary)

Journal ref: Annals of Statistics 2009, Vol. 37, No. 6B, 3779-3821

Showing 1–25 of 25 results for author: van de Geer, S