Search | arXiv e-print repository

A Fundamental Accuracy--Robustness Trade-off in Regression and Classification

Abstract: We derive a fundamental trade-off between standard and adversarial risk in a rather general situation that formalizes the following simple intuition: "If no (nearly) optimal predictor is smooth, adversarial robustness comes at the cost of accuracy." As a concrete example, we evaluate the derived trade-off in regression with polynomial ridge functions under mild regularity conditions. Generalizing… ▽ More We derive a fundamental trade-off between standard and adversarial risk in a rather general situation that formalizes the following simple intuition: "If no (nearly) optimal predictor is smooth, adversarial robustness comes at the cost of accuracy." As a concrete example, we evaluate the derived trade-off in regression with polynomial ridge functions under mild regularity conditions. Generalizing our analysis of this example, we formulate a necessary condition under which adversarial robustness can be achieved without significant degradation of the accuracy. This necessary condition is expressed in terms of a quantity that resembles the Poincaré constant of the data distribution. △ Less

Submitted 28 June, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

arXiv:2209.10053 [pdf, ps, other]

Instance-dependent uniform tail bounds for empirical processes

Authors: Sohail Bahmani

Abstract: We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial ``deflation'' step to the standard generic chaining argument. The resulting tail bound is the sum of the complexity of the ``deflated fu… ▽ More We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial ``deflation'' step to the standard generic chaining argument. The resulting tail bound is the sum of the complexity of the ``deflated function class'' in terms of a generalization of Talagrand's $γ$ functional, and the deviation of the function instance, both of which are formulated based on the natural seminorm induced by the corresponding Cramér functions. Leveraging another less demanding natural seminorm, we also show similar bounds, though with implicit dependence on the sample size, in the more general case where finite exponential moments cannot be assumed. We also provide approximations of the tail bounds in terms of the more prevalent Orlicz norms or their ``incomplete'' versions under suitable moment conditions. △ Less

Submitted 29 September, 2024; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: 34 pages

arXiv:2110.15283 [pdf, other]

Decentralized Feature-Distributed Optimization for Generalized Linear Models

Authors: Brighton Ancelin, Sohail Bahmani, Justin Romberg

Abstract: We consider the "all-for-one" decentralized learning problem for generalized linear models. The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables. To solve the regularized empirical risk minimization in this distributed setting, we apply the Chambolle--Pock primal--dual algorithm to an equivalent sa… ▽ More We consider the "all-for-one" decentralized learning problem for generalized linear models. The features of each sample are partitioned among several collaborating agents in a connected network, but only one agent observes the response variables. To solve the regularized empirical risk minimization in this distributed setting, we apply the Chambolle--Pock primal--dual algorithm to an equivalent saddle-point formulation of the problem. The primal and dual iterations are either in closed-form or reduce to coordinate-wise minimization of scalar convex functions. We establish convergence rates for the empirical risk minimization under two different assumptions on the loss function (Lipschitz and square root Lipschitz), and show how they depend on the characteristics of the design matrix and the Laplacian of the network. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2103.07020 [pdf, ps, other]

Max-Linear Regression by Convex Programming

Authors: Seonho Kim, Sohail Bahmani, Kiryung Lee

Abstract: We consider the multivariate max-linear regression problem where the model parameters $\boldsymbolβ_{1},\dotsc,\boldsymbolβ_{k}\in\mathbb{R}^{p}$ need to be estimated from $n$ independent samples of the (noisy) observations $y = \max_{1\leq j \leq k} \boldsymbolβ_{j}^{\mathsf{T}} \boldsymbol{x} + \mathrm{noise}$. The max-linear model vastly generalizes the conventional linear model, and it can app… ▽ More We consider the multivariate max-linear regression problem where the model parameters $\boldsymbolβ_{1},\dotsc,\boldsymbolβ_{k}\in\mathbb{R}^{p}$ need to be estimated from $n$ independent samples of the (noisy) observations $y = \max_{1\leq j \leq k} \boldsymbolβ_{j}^{\mathsf{T}} \boldsymbol{x} + \mathrm{noise}$. The max-linear model vastly generalizes the conventional linear model, and it can approximate any convex function to an arbitrary accuracy when the number of linear models $k$ is large enough. However, the inherent nonlinearity of the max-linear model renders the estimation of the regression parameters computationally challenging. Particularly, no estimator based on convex programming is known in the literature. We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem. Under the standard Gaussian observation setting, we present a non-asymptotic performance guarantee showing that the convex program recovers the parameters with high probability. When the $k$ linear components are equally likely to achieve the maximum, our result shows a sufficient number of noise-free observations for exact recovery scales as {$k^{4}p$} up to a logarithmic factor. { This sample complexity coincides with that by alternating minimization (Ghosh et al., {2021}). Moreover, the same sample complexity applies when the observations are corrupted with arbitrary deterministic noise. We provide empirical results that show that our method performs as our theoretical result predicts, and is competitive with the alternating minimization algorithm particularly in presence of multiplicative Bernoulli noise. Furthermore, we also show empirically that a recursive application of AR can significantly improve the estimation accuracy.} △ Less

Submitted 23 February, 2024; v1 submitted 11 March, 2021; originally announced March 2021.

arXiv:2004.02718 [pdf, other]

Low-Rank Matrix Estimation From Rank-One Projections by Unlifted Convex Optimization

Authors: Sohail Bahmani, Kiryung Lee

Abstract: We study an estimator with a convex formulation for recovery of low-rank matrices from rank-one projections. Using initial estimates of the factors of the target $d_1\times d_2$ matrix of rank-$r$, the estimator admits a practical subgradient method operating in a space of dimension $r(d_1+d_2)$. This property makes the estimator significantly more scalable than the convex estimators based on lift… ▽ More We study an estimator with a convex formulation for recovery of low-rank matrices from rank-one projections. Using initial estimates of the factors of the target $d_1\times d_2$ matrix of rank-$r$, the estimator admits a practical subgradient method operating in a space of dimension $r(d_1+d_2)$. This property makes the estimator significantly more scalable than the convex estimators based on lifting and semidefinite programming. Furthermore, we present a streamlined analysis for exact recovery under the real Gaussian measurement model, as well as the partially derandomized measurement model by using the spherical $t$-design. We show that under both models the estimator succeeds, with high probability, if the number of measurements exceeds $r^2 (d_1+d_2)$ up to some logarithmic factors. This sample complexity improves on the existing results for nonconvex iterative algorithms. △ Less

Submitted 10 January, 2021; v1 submitted 6 April, 2020; originally announced April 2020.

arXiv:1908.09915 [pdf, other]

Convex Programming for Estimation in Nonlinear Recurrent Models

Authors: Sohail Bahmani, Justin Romberg

Abstract: We propose a formulation for nonlinear recurrent models that includes simple parametric models of recurrent neural networks as a special case. The proposed formulation leads to a natural estimator in the form of a convex program. We provide a sample complexity for this estimator in the case of stable dynamics, where the nonlinear recursion has a certain contraction property, and under certain regu… ▽ More We propose a formulation for nonlinear recurrent models that includes simple parametric models of recurrent neural networks as a special case. The proposed formulation leads to a natural estimator in the form of a convex program. We provide a sample complexity for this estimator in the case of stable dynamics, where the nonlinear recursion has a certain contraction property, and under certain regularity conditions on the input distribution. We evaluate the performance of the estimator by simulation on synthetic data. These numerical experiments also suggest the extent at which the imposed theoretical assumptions may be relaxed. △ Less

Submitted 26 August, 2019; originally announced August 2019.

Comments: 18 pages

MSC Class: 90C25; 93E35

arXiv:1806.07307 [pdf, other]

Estimation from Non-Linear Observations via Convex Programming with Application to Bilinear Regression

Authors: Sohail Bahmani

Abstract: We propose a computationally efficient estimator, formulated as a convex program, for a broad class of non-linear regression problems that involve difference of convex (DC) non-linearities. The proposed method can be viewed as a significant extension of the "anchored regression" method formulated and analyzed in [10] for regression with convex non-linearities. Our main assumption, in addition to o… ▽ More We propose a computationally efficient estimator, formulated as a convex program, for a broad class of non-linear regression problems that involve difference of convex (DC) non-linearities. The proposed method can be viewed as a significant extension of the "anchored regression" method formulated and analyzed in [10] for regression with convex non-linearities. Our main assumption, in addition to other mild statistical and computational assumptions, is availability of a certain approximation oracle for the average of the gradients of the observation functions at a ground truth. Under this assumption and using a PAC-Bayesian analysis we show that the proposed estimator produces an accurate estimate with high probability. As a concrete example, we study the proposed framework in the bilinear regression problem with Gaussian factors and quantify a sufficient sample complexity for exact recovery. Furthermore, we describe a computationally tractable scheme that provably produces the required approximation oracle in the considered bilinear regression problem. △ Less

Submitted 28 March, 2019; v1 submitted 19 June, 2018; originally announced June 2018.

Comments: Some elaboration on the algorithm and theoretical results are added. Minor errors and typos corrected

MSC Class: 62F10; 90C25; 62P30

arXiv:1702.05327 [pdf, other]

Solving Equations of Random Convex Functions via Anchored Regression

Authors: Sohail Bahmani, Justin Romberg

Abstract: We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex pro… ▽ More We consider the question of estimating a solution to a system of equations that involve convex nonlinearities, a problem that is common in machine learning and signal processing. Because of these nonlinearities, conventional estimators based on empirical risk minimization generally involve solving a non-convex optimization program. We propose anchored regression, a new approach based on convex programming that amounts to maximizing a linear functional (perhaps augmented by a regularizer) over a convex set. The proposed convex program is formulated in the natural space of the problem, and avoids the introduction of auxiliary variables, making it computationally favorable. Working in the native space also provides great flexibility as structural priors (e.g., sparsity) can be seamlessly incorporated. For our analysis, we model the equations as being drawn from a fixed set according to a probability law. Our main results provide guarantees on the accuracy of the estimator in terms of the number of equations we are solving, the amount of noise present, a measure of statistical complexity of the random equations, and the geometry of the regularizer at the true solution. We also provide recipes for constructing the anchor vector (that determines the linear functional to maximize) directly from the observed data. △ Less

Submitted 13 August, 2018; v1 submitted 17 February, 2017; originally announced February 2017.

arXiv:1610.04210 [pdf, other]

Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation

Authors: Sohail Bahmani, Justin Romberg

Abstract: We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. Therefore, we avoid the prohibitive computational cost associated with "lifting" and semidefinite programming (SDP) in methods such as PhaseLift and compete with recently developed non-convex techniques for phase retrieval. We relax the quadratic equations for phaseless measur… ▽ More We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. Therefore, we avoid the prohibitive computational cost associated with "lifting" and semidefinite programming (SDP) in methods such as PhaseLift and compete with recently developed non-convex techniques for phase retrieval. We relax the quadratic equations for phaseless measurements to inequality constraints each of which representing a symmetric "slab". Through a simple convex program, our proposed estimator finds an extreme point of the intersection of these slabs that is best aligned with a given anchor vector. We characterize geometric conditions that certify success of the proposed estimator. Furthermore, using classic results in statistical learning theory, we show that for random measurements the geometric certificates hold with high probability at an optimal sample complexity. Phase transition of our estimator is evaluated through simulations. Our numerical experiments also suggest that the proposed method can solve phase retrieval problems with coded diffraction measurements as well. △ Less

Submitted 16 March, 2017; v1 submitted 13 October, 2016; originally announced October 2016.

Comments: Accepted in AISTATS 2017. Extended the discussion of related work and added a few more references. Clarified some of the statements and notations

arXiv:1506.08159 [pdf, other]

Near-Optimal Estimation of Simultaneously Sparse and Low-Rank Matrices from Nested Linear Measurements

Authors: Sohail Bahmani, Justin Romberg

Abstract: In this paper we consider the problem of estimating simultaneously low-rank and row-wise sparse matrices from nested linear measurements where the linear operator consists of the product of a linear operator $\mathcal{W}$ and a matrix $\mathbf{\varPsi}$. Leveraging the nested structure of the measurement operator, we propose a computationally efficient two-stage algorithm for estimating the simult… ▽ More In this paper we consider the problem of estimating simultaneously low-rank and row-wise sparse matrices from nested linear measurements where the linear operator consists of the product of a linear operator $\mathcal{W}$ and a matrix $\mathbf{\varPsi}$. Leveraging the nested structure of the measurement operator, we propose a computationally efficient two-stage algorithm for estimating the simultaneously structured target matrix. Assuming that $\mathcal{W}$ is a restricted isometry for low-rank matrices and $\mathbf{\varPsi}$ is a restricted isometry for row-wise sparse matrices, we establish an accuracy guarantee that holds uniformly for all sufficiently low-rank and row-wise sparse matrices with high probability. Furthermore, using standard tools from information theory, we establish a minimax lower bound for estimation of simultaneously low-rank and row-wise sparse matrices from linear measurements that need not be nested. The accuracy bounds established for the algorithm, that also serve as a minimax upper bound, differ from the derived minimax lower bound merely by a polylogarithmic factor of the dimensions. Therefore, the proposed algorithm is nearly minimax optimal. We also discuss some applications of the proposed observation model and evaluate our algorithm through numerical simulation. △ Less

Submitted 21 March, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

arXiv:1209.1557 [pdf, ps, other]

doi 10.1109/TIT.2016.2515078

Learning Model-Based Sparsity via Projected Gradient Descent

Authors: Sohail Bahmani, Petros T. Boufounos, Bhiksha Raj

Abstract: Several convex formulation methods have been proposed previously for statistical estimation with structured sparsity as the prior. These methods often require a carefully tuned regularization parameter, often a cumbersome or heuristic exercise. Furthermore, the estimate that these methods produce might not belong to the desired sparsity model, albeit accurately approximating the true parameter. Th… ▽ More Several convex formulation methods have been proposed previously for statistical estimation with structured sparsity as the prior. These methods often require a carefully tuned regularization parameter, often a cumbersome or heuristic exercise. Furthermore, the estimate that these methods produce might not belong to the desired sparsity model, albeit accurately approximating the true parameter. Therefore, greedy-type algorithms could often be more desirable in estimating structured-sparse parameters. So far, these greedy methods have mostly focused on linear statistical models. In this paper we study the projected gradient descent with non-convex structured-sparse parameter model as the constraint set. Should the cost function have a Stable Model-Restricted Hessian the algorithm produces an approximation for the desired minimizer. As an example we elaborate on application of the main results to estimation in Generalized Linear Model. △ Less

Submitted 27 January, 2016; v1 submitted 7 September, 2012; originally announced September 2012.

MSC Class: 62FXX; 65KXX

Journal ref: IEEE Transactions on Information Theory 62(4):2092--2099, 2016

arXiv:1203.5483 [pdf, other]

Greedy Sparsity-Constrained Optimization

Authors: Sohail Bahmani, Bhiksha Raj, Petros Boufounos

Abstract: Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and compressive Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by… ▽ More Sparsity-constrained optimization has wide applicability in machine learning, statistics, and signal processing problems such as feature selection and compressive Sensing. A vast body of work has studied the sparsity-constrained optimization from theoretical, algorithmic, and application aspects in the context of sparse estimation in linear models where the fidelity of the estimate is measured by the squared error. In contrast, relatively less effort has been made in the study of sparsity-constrained optimization in cases where nonlinear models are involved or the cost function is not quadratic. In this paper we propose a greedy algorithm, Gradient Support Pursuit (GraSP), to approximate sparse minima of cost functions of arbitrary form. Should a cost function have a Stable Restricted Hessian (SRH) or a Stable Restricted Linearization (SRL), both of which are introduced in this paper, our algorithm is guaranteed to produce a sparse vector within a bounded distance from the true sparse optimum. Our approach generalizes known results for quadratic cost functions that arise in sparse linear regression and Compressive Sensing. We also evaluate the performance of GraSP through numerical simulations on synthetic data, where the algorithm is employed for sparse logistic regression with and without $\ell_2$-regularization. △ Less

Submitted 6 January, 2013; v1 submitted 25 March, 2012; originally announced March 2012.

MSC Class: 62FXX; 65KXX

Journal ref: Journal of Machine Learning Research, 14(3):807--841, 2013

arXiv:1107.4623 [pdf, ps, other]

doi 10.1016/j.acha.2012.07.004

A Unifying Analysis of Projected Gradient Descent for $\ell_p$-constrained Least Squares

Authors: Sohail Bahmani, Bhiksha Raj

Abstract: In this paper we study the performance of the Projected Gradient Descent(PGD) algorithm for $\ell_{p}$-constrained least squares problems that arise in the framework of Compressed Sensing. Relying on the Restricted Isometry Property, we provide convergence guarantees for this algorithm for the entire range of $0\leq p\leq1$, that include and generalize the existing results for the Iterative Hard T… ▽ More In this paper we study the performance of the Projected Gradient Descent(PGD) algorithm for $\ell_{p}$-constrained least squares problems that arise in the framework of Compressed Sensing. Relying on the Restricted Isometry Property, we provide convergence guarantees for this algorithm for the entire range of $0\leq p\leq1$, that include and generalize the existing results for the Iterative Hard Thresholding algorithm and provide a new accuracy guarantee for the Iterative Soft Thresholding algorithm as special cases. Our results suggest that in this group of algorithms, as $p$ increases from zero to one, conditions required to guarantee accuracy become stricter and robustness to noise deteriorates. △ Less

Submitted 26 June, 2012; v1 submitted 22 July, 2011; originally announced July 2011.

Comments: 16 pages, 3 Figures

Journal ref: Applied and Computational Harmonic Analysis, 34(3):366-378, 2013

Showing 1–13 of 13 results for author: Bahmani, S