Search | arXiv e-print repository

Gradient-free stochastic optimization for additive models

Authors: Arya Akhavan, Alexandre B. Tsybakov

Abstract: We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Łojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the Hölder family of functions. The additive model for Hölder classes of functions is… ▽ More We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Łojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the Hölder family of functions. The additive model for Hölder classes of functions is well-studied in the literature on nonparametric function estimation, where it is shown that such a model benefits from a substantial improvement of the estimation accuracy compared to the Hölder model without additive structure. We study this established framework in the context of gradient-free optimization. We propose a randomized gradient estimator that, when plugged into a gradient descent algorithm, allows one to achieve minimax optimal optimization error of the order $dT^{-(β-1)/β}$, where $d$ is the dimension of the problem, $T$ is the number of queries and $β\ge 2$ is the Hölder degree of smoothness. We conclude that, in contrast to nonparametric estimation problems, no substantial gain of accuracy can be achieved when using additive models in gradient-free optimization. △ Less

Submitted 5 April, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

arXiv:2406.05714 [pdf, ps, other]

A conversion theorem and minimax optimality for continuum contextual bandits

Authors: Arya Akhavan, Karim Lounici, Massimiliano Pontil, Alexandre B. Tsybakov

Abstract: We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret, which is stronger than the standard static regret. Assuming tha… ▽ More We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all the underlying functions for the received contexts, leading to the contextual notion of regret, which is stronger than the standard static regret. Assuming that the objective functions are $γ$-Hölder with respect to the contexts, $0<γ\le 1,$ we demonstrate that any algorithm achieving a sub-linear static regret can be extended to achieve a sub-linear contextual regret. We prove a static-to-contextual regret conversion theorem that provides an upper bound for the contextual regret of the output algorithm as a function of the static regret of the input algorithm. We further study the implications of this general result for three fundamental cases of dependency of the objective function on the action variable: (a) Lipschitz bandits, (b) convex bandits, (c) strongly convex and smooth bandits. For Lipschitz bandits and $γ=1,$ combining our results with the lower bound of Slivkins (2014), we prove that the minimax optimal contextual regret for the noise-free adversarial setting is achieved. Then, we prove that in the presence of noise, the contextual regret rate as a function of the number of queries is the same for convex bandits as it is for strongly convex and smooth bandits. Lastly, we present a minimax lower bound, implying two key facts. First, obtaining a sub-linear contextual regret may be impossible over functions that are not continuous with respect to the context. Second, for convex bandits and strongly convex and smooth bandits, the algorithms that we propose achieve, up to a logarithmic factor, the minimax optimal rate of contextual regret as a function of the number of queries. △ Less

Submitted 17 April, 2025; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2306.02159 [pdf, ps, other]

Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm

Authors: Arya Akhavan, Evgenii Chzhen, Massimiliano Pontil, Alexandre B. Tsybakov

Abstract: This work studies minimization problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth and possibly satisfies additional properties. We consider two kinds of zero-order projected gradient descent algorithms, which differ in the form of the gradient estimator. The first algorithm uses a gradient estimator based on randomization over the… ▽ More This work studies minimization problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth and possibly satisfies additional properties. We consider two kinds of zero-order projected gradient descent algorithms, which differ in the form of the gradient estimator. The first algorithm uses a gradient estimator based on randomization over the $\ell_2$ sphere due to Bach and Perchet (2016). We present an improved analysis of this algorithm on the class of highly smooth and strongly convex functions studied in the prior work, and we derive rates of convergence for two more general classes of non-convex functions. Namely, we consider highly smooth functions satisfying the Polyak-Łojasiewicz condition and the class of highly smooth functions with no additional property. The second algorithm is based on randomization over the $\ell_1$ sphere, and it extends to the highly smooth setting the algorithm that was recently proposed for Lipschitz convex functions in Akhavan et al. (2022). We show that, in the case of noiseless oracle, this novel algorithm enjoys better bounds on bias and variance than the $\ell_2$ randomization and the commonly used Gaussian randomization algorithms, while in the noisy case both $\ell_1$ and $\ell_2$ algorithms benefit from similar improved theoretical guarantees. The improvements are achieved thanks to a new proof techniques based on Poincaré type inequalities for uniform distributions on the $\ell_1$ or $\ell_2$ spheres. The results are established under weak (almost adversarial) assumptions on the noise. Moreover, we provide minimax lower bounds proving optimality or near optimality of the obtained upper bounds in several cases. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2211.16457 [pdf, ps, other]

Estimating the minimizer and the minimum value of a regression function under passive design

Authors: Arya Akhavan, Davit Gogolashvili, Alexandre B. Tsybakov

Abstract: We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algor… ▽ More We propose a new method for estimating the minimizer $\boldsymbol{x}^*$ and the minimum value $f^*$ of a smooth and strongly convex regression function $f$ from the observations contaminated by random noise. Our estimator $\boldsymbol{z}_n$ of the minimizer $\boldsymbol{x}^*$ is based on a version of the projected gradient descent with the gradient estimated by a regularized local polynomial algorithm. Next, we propose a two-stage procedure for estimation of the minimum value $f^*$ of regression function $f$. At the first stage, we construct an accurate enough estimator of $\boldsymbol{x}^*$, which can be, for example, $\boldsymbol{z}_n$. At the second stage, we estimate the function value at the point obtained in the first stage using a rate optimal nonparametric procedure. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $\boldsymbol{z}_n$, and for the risk of estimating $f^*$. We establish minimax lower bounds showing that, under certain choice of parameters, the proposed algorithms achieve the minimax optimal rates of convergence on the class of smooth and strongly convex functions. △ Less

Submitted 8 October, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: 35 pages

MSC Class: 62G05; 90C25

arXiv:2206.13347 [pdf, other]

Benign overfitting and adaptive nonparametric regression

Authors: Julien Chhor, Suzanne Sigalla, Alexandre B. Tsybakov

Abstract: In the nonparametric regression setting, we construct an estimator which is a continuous function interpolating the data points with high probability, while attaining minimax optimal rates under mean squared risk on the scale of Hölder classes adaptively to the unknown smoothness. In the nonparametric regression setting, we construct an estimator which is a continuous function interpolating the data points with high probability, while attaining minimax optimal rates under mean squared risk on the scale of Hölder classes adaptively to the unknown smoothness. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2205.13910 [pdf, other]

A gradient estimator via L1-randomization for online zero-order optimization with two point feedback

Authors: Arya Akhavan, Evgenii Chzhen, Massimiliano Pontil, Alexandre B. Tsybakov

Abstract: This work studies online zero-order optimization of convex and Lipschitz functions. We present a novel gradient estimator based on two function evaluations and randomization on the $\ell_1$-sphere. Considering different geometries of feasible sets and Lipschitz assumptions we analyse online dual averaging algorithm with our estimator in place of the usual gradient. We consider two types of assumpt… ▽ More This work studies online zero-order optimization of convex and Lipschitz functions. We present a novel gradient estimator based on two function evaluations and randomization on the $\ell_1$-sphere. Considering different geometries of feasible sets and Lipschitz assumptions we analyse online dual averaging algorithm with our estimator in place of the usual gradient. We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise. We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem. In the case of canceling noise that was previously studied in the literature, our guarantees are either comparable or better than state-of-the-art bounds obtained by Duchi et al. (2015) and Shamir (2017) for non-adaptive algorithms. Our analysis is based on deriving a new weighted Poincaré type inequality for the uniform measure on the $\ell_1$-sphere with explicit constants, which may be of independent interest. △ Less

Submitted 20 September, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

arXiv:2006.07862 [pdf, ps, other]

Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits

Authors: Arya Akhavan, Massimiliano Pontil, Alexandre B. Tsybakov

Abstract: We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the proje… ▽ More We study the problem of zero-order optimization of a strongly convex function. The goal is to find the minimizer of the function by a sequential exploration of its values, under measurement noise. We study the impact of higher order smoothness properties of the function on the optimization error and on the cumulative regret. To solve this problem we consider a randomized approximation of the projected gradient descent algorithm. The gradient is estimated by a randomized procedure involving two function evaluations and a smoothing kernel. We derive upper bounds for this algorithm both in the constrained and unconstrained settings and prove minimax lower bounds for any sequential search method. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters. Based on this algorithm, we also propose an estimator of the minimum value of the function achieving almost sharp oracle behavior. We compare our results with the state-of-the-art, highlighting a number of key improvements. △ Less

Submitted 24 November, 2022; v1 submitted 14 June, 2020; originally announced June 2020.

arXiv:2005.12225 [pdf, other]

An alternative to synthetic control for models with many covariates under sparsity

Authors: Marianne Bléhaut, Xavier D'Haultfoeuille, Jérémy L'Hour, Alexandre B. Tsybakov

Abstract: The synthetic control method is a an econometric tool to evaluate causal effects when only one unit is treated. While initially aimed at evaluating the effect of large-scale macroeconomic changes with very few available control units, it has increasingly been used in place of more well-known microeconometric tools in a broad range of applications, but its properties in this context are unknown. Th… ▽ More The synthetic control method is a an econometric tool to evaluate causal effects when only one unit is treated. While initially aimed at evaluating the effect of large-scale macroeconomic changes with very few available control units, it has increasingly been used in place of more well-known microeconometric tools in a broad range of applications, but its properties in this context are unknown. This paper introduces an alternative to the synthetic control method, which is developed both in the usual asymptotic framework and in the high-dimensional scenario. We propose an estimator of average treatment effect that is doubly robust, consistent and asymptotically normal. It is also immunized against first-step selection mistakes. We illustrate these properties using Monte Carlo simulations and applications to both standard and potentially high-dimensional settings, and offer a comparison with the synthetic control method. △ Less

Submitted 20 June, 2021; v1 submitted 25 May, 2020; originally announced May 2020.

Comments: 39 pages, 3 figures

arXiv:1806.09471 [pdf, other]

Does data interpolation contradict statistical optimality?

Authors: Mikhail Belkin, Alexander Rakhlin, Alexandre B. Tsybakov

Abstract: We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss. We show that learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss. △ Less

Submitted 25 June, 2018; originally announced June 2018.

arXiv:1710.10870 [pdf, other]

Sparse covariance matrix estimation in high-dimensional deconvolution

Authors: Denis Belomestny, Mathias Trabs, Alexandre B. Tsybakov

Abstract: We study the estimation of the covariance matrix $Σ$ of a $p$-dimensional normal random vector based on $n$ independent observations corrupted by additive noise. Only a general nonparametric assumption is imposed on the distribution of the noise without any sparsity constraint on its covariance matrix. In this high-dimensional semiparametric deconvolution problem, we propose spectral thresholding… ▽ More We study the estimation of the covariance matrix $Σ$ of a $p$-dimensional normal random vector based on $n$ independent observations corrupted by additive noise. Only a general nonparametric assumption is imposed on the distribution of the noise without any sparsity constraint on its covariance matrix. In this high-dimensional semiparametric deconvolution problem, we propose spectral thresholding estimators that are adaptive to the sparsity of $Σ$. We establish an oracle inequality for these estimators under model miss-specification and derive non-asymptotic minimax convergence rates that are shown to be logarithmic in $n/\log p$. We also discuss the estimation of low-rank matrices based on indirect observations as well as the generalization to elliptical distributions. The finite sample performance of the threshold estimators is illustrated in a numerical example. △ Less

Submitted 26 March, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

MSC Class: Primary 62H12; secondary 62F12; 62G05

arXiv:1412.7216 [pdf, ps, other]

An $\{l_1,l_2,l_{\infty}\}$-Regularization Approach to High-Dimensional Errors-in-variables Models

Authors: Alexandre Belloni, Mathieu Rosenbaum, Alexandre B. Tsybakov

Abstract: Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some $\bar δ$, and (2) zero mean independent observation errors. Unde… ▽ More Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some $\bar δ$, and (2) zero mean independent observation errors. Under the first assumption, the rates of convergence of the proposed estimators depend explicitly on $\bar δ$, while the second assumption has been applied when an estimator for the second moment of the observational error is available. This work proposes and studies two new estimators which, compared to other procedures for regression models with errors in the design, exploit an additional $l_{\infty}$-norm regularization. The first estimator is applicable when both (1) and (2) hold but does not require an estimator for the second moment of the observational error. The second estimator is applicable under (2) and requires an estimator for the second moment of the observation error. Importantly, we impose no assumption on the accuracy of this pilot estimator, in contrast to the previously known procedures. As the recent proposals, we allow the number of covariates to be much larger than the sample size. We establish the rates of convergence of the estimators and compare them with the bounds obtained for related estimators in the literature. These comparisons show interesting insights on the interplay of the assumptions and the achievable rates of convergence. △ Less

Submitted 22 December, 2014; originally announced December 2014.

arXiv:1408.0241 [pdf, ps, other]

Linear and Conic Programming Estimators in High-Dimensional Errors-in-variables Models

Authors: Alexandre Belloni, Mathieu Rosenbaum, Alexandre Tsybakov

Abstract: We consider the linear regression model with observation error in the design. In this setting, we allow the number of covariates to be much larger than the sample size. Several new estimation methods have been recently introduced for this model. Indeed, the standard Lasso estimator or Dantzig selector turn out to become unreliable when only noisy regressors are available, which is quite common in… ▽ More We consider the linear regression model with observation error in the design. In this setting, we allow the number of covariates to be much larger than the sample size. Several new estimation methods have been recently introduced for this model. Indeed, the standard Lasso estimator or Dantzig selector turn out to become unreliable when only noisy regressors are available, which is quite common in practice. We show in this work that under suitable sparsity assumptions, the procedure introduced in Rosenbaum and Tsybakov (2013) is almost optimal in a minimax sense and, despite non-convexities, can be efficiently computed by a single linear programming problem. Furthermore, we provide an estimator attaining the minimax efficiency bound. This estimator is written as a second order cone programming minimisation problem which can be solved numerically in polynomial time. △ Less

Submitted 3 July, 2016; v1 submitted 1 August, 2014; originally announced August 2014.

arXiv:1108.5116 [pdf, ps, other]

doi 10.1214/12-STS393

Sparse Estimation by Exponential Weighting

Authors: Philippe Rigollet, Alexandre B. Tsybakov

Abstract: Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows… ▽ More Consider a regression model with fixed design and Gaussian noise where the regression function can potentially be well approximated by a function that admits a sparse representation in a given dictionary. This paper resorts to exponential weights to exploit this underlying sparsity by implementing the principle of sparsity pattern aggregation. This model selection take on sparse estimation allows us to derive sparsity oracle inequalities in several popular frameworks, including ordinary sparsity, fused sparsity and group sparsity. One striking aspect of these theoretical results is that they hold under no condition in the dictionary. Moreover, we describe an efficient implementation of the sparsity pattern aggregation principle that compares favorably to state-of-the-art procedures on some basic numerical examples. △ Less

Submitted 7 January, 2013; v1 submitted 25 August, 2011; originally announced August 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-STS393 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS393

Journal ref: Statistical Science 2012, Vol. 27, No. 4, 558-575

arXiv:1011.6256 [pdf, ps, other]

Nuclear norm penalization and optimal rates for noisy low rank matrix completion

Authors: Vladimir Koltchinskii, Alexandre B. Tsybakov, Karim Lounici

Abstract: This paper deals with the trace regression model where $n$ entries or linear combinations of entries of an unknown $m_1\times m_2$ matrix $A_0$ corrupted by noise are observed. We propose a new nuclear norm penalized estimator of $A_0$ and establish a general sharp oracle inequality for this estimator for arbitrary values of $n,m_1,m_2$ under the condition of isometry in expectation. Then this met… ▽ More This paper deals with the trace regression model where $n$ entries or linear combinations of entries of an unknown $m_1\times m_2$ matrix $A_0$ corrupted by noise are observed. We propose a new nuclear norm penalized estimator of $A_0$ and establish a general sharp oracle inequality for this estimator for arbitrary values of $n,m_1,m_2$ under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting $m_1m_2\gg n$. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix $A_0$, a non-minimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of $A_0$ with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by $A_0$ and the aim is to find the best trace regression model approximating the data. △ Less

Submitted 23 March, 2016; v1 submitted 29 November, 2010; originally announced November 2010.

MSC Class: 62J99; 62H12; 60B20; 60G15

arXiv:0906.2885 [pdf, other]

Noisy Independent Factor Analysis Model for Density Estimation and Classification

Authors: Umberto Amato, Anestis Antoniadis, Alexander Samarov, Alexander Tsybakov

Abstract: We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of compone… ▽ More We consider the problem of multivariate density estimation when the unknown density is assumed to follow a particular form of dimensionality reduction, a noisy independent factor analysis (IFA) model. In this model the data are generated by a number of latent independent components having unknown distributions and are observed in Gaussian noise. We do not assume that either the number of components or the matrix mixing the components are known. We show that the densities of this form can be estimated with a fast rate. Using the mirror averaging aggregation algorithm, we construct a density estimator which achieves a nearly parametric rate log^(1/4)n/sqrt(n), independent of the dimensionality of the data, as the sample size $n$ tends to infinity. This estimator is adaptive to the number of components, their distributions and the mixing matrix. We then apply this density estimator to construct nonparametric plug-in classifiers and show that they achieve the best obtainable rate of the excess Bayes risk, to within a logarithmic factor independent of the dimension of the data. Applications of this classifier to simulated data sets and to real data from a remote sensing experiment show promising results. △ Less

Submitted 16 June, 2009; originally announced June 2009.

arXiv:0903.1468 [pdf, ps, other]

Taking Advantage of Sparsity in Multi-Task Learning

Authors: Karim Lounici, Massimiliano Pontil, Alexandre B. Tsybakov, Sara van de Geer

Abstract: We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to… ▽ More We study the problem of estimating multiple linear regression equations for the purpose of both prediction and variable selection. Following recent work on multi-task learning Argyriou et al. [2008], we assume that the regression vectors share the same sparsity pattern. This means that the set of relevant predictor variables is the same across the different equations. This assumption leads us to consider the Group Lasso as a candidate estimation method. We show that this estimator enjoys nice sparsity oracle inequalities and variable selection properties. The results hold under a certain restricted eigenvalue condition and a coherence condition on the design matrix, which naturally extend recent work in Bickel et al. [2007], Lounici [2008]. In particular, in the multi-task learning scenario, in which the number of tasks can grow, we are able to remove completely the effect of the number of predictor variables in the bounds. Finally, we show how our results can be extended to more general noise distributions, of which we only require the variance to be finite. △ Less

Submitted 8 March, 2009; originally announced March 2009.

Journal ref: 10 pages, 1 figure, Proc. Computational Learning Theory Conference (COLT 2009)

arXiv:0903.1223 [pdf, ps, other]

doi 10.1016/j.jcss.2011.12.023

Sparse Regression Learning by Aggregation and Langevin Monte-Carlo

Authors: Arnak Dalalyan, Alexandre B. Tsybakov

Abstract: We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter $β$ of the EWA is larger than or equal to… ▽ More We consider the problem of regression learning for deterministic design and independent random errors. We start by proving a sharp PAC-Bayesian type bound for the exponentially weighted aggregate (EWA) under the expected squared empirical loss. For a broad class of noise distributions the presented bound is valid whenever the temperature parameter $β$ of the EWA is larger than or equal to $4σ^2$, where $σ^2$ is the noise variance. A remarkable feature of this result is that it is valid even for unbounded regression functions and the choice of the temperature parameter depends exclusively on the noise level. Next, we apply this general bound to the problem of aggregating the elements of a finite-dimensional linear space spanned by a dictionary of functions $φ_1,...,φ_M$. We allow $M$ to be much larger than the sample size $n$ but we assume that the true regression function can be well approximated by a sparse linear combination of functions $φ_j$. Under this sparsity scenario, we propose an EWA with a heavy tailed prior and we show that it satisfies a sparsity oracle inequality with leading constant one. Finally, we propose several Langevin Monte-Carlo algorithms to approximately compute such an EWA when the number $M$ of aggregated functions can be large. We discuss in some detail the convergence of these algorithms and present numerical experiments that confirm our theoretical findings. △ Less

Submitted 16 February, 2010; v1 submitted 6 March, 2009; originally announced March 2009.

Comments: Short version published in COLT 2009

Journal ref: Journal of Computer and System Sciences 78 (2012) 1423-1443

arXiv:0901.2044 [pdf, ps, other]

doi 10.1214/09-AOS790

SPADES and mixture models

Authors: Florentina Bunea, Alexandre B. Tsybakov, Marten H. Wegkamp, Adrian Barbu

Abstract: This paper studies sparse density estimation via $\ell_1$ penalization (SPADES). We focus on estimation in high-dimensional mixture models and nonparametric adaptive density estimation. We show, respectively, that SPADES can recover, with high probability, the unknown components of a mixture of probability densities and that it yields minimax adaptive density estimates. These results are based on… ▽ More This paper studies sparse density estimation via $\ell_1$ penalization (SPADES). We focus on estimation in high-dimensional mixture models and nonparametric adaptive density estimation. We show, respectively, that SPADES can recover, with high probability, the unknown components of a mixture of probability densities and that it yields minimax adaptive density estimates. These results are based on a general sparsity oracle inequality that the SPADES estimates satisfy. We offer a data driven method for the choice of the tuning parameter used in the construction of SPADES. The method uses the generalized bisection method first introduced in \citebb09. The suggested procedure bypasses the need for a grid search and offers substantial computational savings. We complement our theoretical results with a simulation study that employs this method for approximations of one and two-dimensional densities with mixtures. The numerical results strongly support our theoretical findings. △ Less

Submitted 21 October, 2010; v1 submitted 14 January, 2009; originally announced January 2009.

Comments: Published in at http://dx.doi.org/10.1214/09-AOS790 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS790

Journal ref: Annals of Statistics 2010, Vol. 38, No. 4, 2525-2558

Showing 1–18 of 18 results for author: Tsybakov, A