-
Sample-Optimal Private Regression in Polynomial Time
Authors:
Prashanti Anderson,
Ainesh Bakshi,
Mahbod Majid,
Stefan Tiegel
Abstract:
We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate eithe…
▽ More
We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters.
Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Improved Robust Estimation for Erdős-Rényi Graphs: The Sparse Regime and Optimal Breakdown Point
Authors:
Hongjie Chen,
Jingqiu Ding,
Yiding Hua,
Stefan Tiegel
Abstract:
We study the problem of robustly estimating the edge density of Erdős-Rényi random graphs $G(n, d^\circ/n)$ when an adversary can arbitrarily add or remove edges incident to an $η$-fraction of the nodes. We develop the first polynomial-time algorithm for this problem that estimates $d^\circ$ up to an additive error $O([\sqrt{\log(n) / n} + η\sqrt{\log(1/η)} ] \cdot \sqrt{d^\circ} + η\log(1/η))$. O…
▽ More
We study the problem of robustly estimating the edge density of Erdős-Rényi random graphs $G(n, d^\circ/n)$ when an adversary can arbitrarily add or remove edges incident to an $η$-fraction of the nodes. We develop the first polynomial-time algorithm for this problem that estimates $d^\circ$ up to an additive error $O([\sqrt{\log(n) / n} + η\sqrt{\log(1/η)} ] \cdot \sqrt{d^\circ} + η\log(1/η))$. Our error guarantee matches information-theoretic lower bounds up to factors of $\log(1/η)$. Moreover, our estimator works for all $d^\circ \geq Ω(1)$ and achieves optimal breakdown point $η= 1/2$.
Previous algorithms [AJK+22, CDHS24], including inefficient ones, incur significantly suboptimal errors. Furthermore, even admitting suboptimal error guarantees, only inefficient algorithms achieve optimal breakdown point. Our algorithm is based on the sum-of-squares (SoS) hierarchy. A key ingredient is to construct constant-degree SoS certificates for concentration of the number of edges incident to small sets in $G(n, d^\circ/n)$. Crucially, we show that these certificates also exist in the sparse regime, when $d^\circ = o(\log n)$, a regime in which the performance of previous algorithms was significantly suboptimal.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More
Authors:
Ilias Diakonikolas,
Samuel B. Hopkins,
Ankit Pensia,
Stefan Tiegel
Abstract:
We study $\textit{sparse singular value certificates}$ for random rectangular matrices. If $M$ is an $n \times d$ matrix with independent Gaussian entries, we give a new family of polynomial-time algorithms which can certify upper bounds on the maximum of $\|M u\|$, where $u$ is a unit vector with at most $ηn$ nonzero entries for a given $η\in (0,1)$. This basic algorithmic primitive lies at the h…
▽ More
We study $\textit{sparse singular value certificates}$ for random rectangular matrices. If $M$ is an $n \times d$ matrix with independent Gaussian entries, we give a new family of polynomial-time algorithms which can certify upper bounds on the maximum of $\|M u\|$, where $u$ is a unit vector with at most $ηn$ nonzero entries for a given $η\in (0,1)$. This basic algorithmic primitive lies at the heart of a wide range of problems across algorithmic statistics and theoretical computer science.
Our algorithms certify a bound which is asymptotically smaller than the naive one, given by the maximum singular value of $M$, for nearly the widest-possible range of $n,d,$ and $η$. Efficiently certifying such a bound for a range of $n,d$ and $η$ which is larger by any polynomial factor than what is achieved by our algorithm would violate lower bounds in the SQ and low-degree polynomials models. Our certification algorithm makes essential use of the Sum-of-Squares hierarchy. To prove the correctness of our algorithm, we develop a new combinatorial connection between the graph matrix approach to analyze random matrices with dependent entries, and the Efron-Stein decomposition of functions of independent random variables.
As applications of our certification algorithm, we obtain new efficient algorithms for a wide range of well-studied algorithmic tasks. In algorithmic robust statistics, we obtain new algorithms for robust mean and covariance estimation with tradeoffs between breakdown point and sample complexity, which are nearly matched by SQ and low-degree polynomial lower bounds (that we establish). We also obtain new polynomial-time guarantees for certification of $\ell_1/\ell_2$ distortion of random subspaces of $\mathbb{R}^n$ (also with nearly matching lower bounds), sparse principal component analysis, and certification of the $2\rightarrow p$ norm of a random matrix.
△ Less
Submitted 30 December, 2024;
originally announced December 2024.
-
Near-Optimal Time-Sparsity Trade-Offs for Solving Noisy Linear Equations
Authors:
Kiril Bangachev,
Guy Bresler,
Stefan Tiegel,
Vinod Vaikuntanathan
Abstract:
We present a polynomial-time reduction from solving noisy linear equations over $\mathbb{Z}/q\mathbb{Z}$ in dimension $Θ(k\log n/\mathsf{poly}(\log k,\log q,\log\log n))$ with a uniformly random coefficient matrix to noisy linear equations over $\mathbb{Z}/q\mathbb{Z}$ in dimension $n$ where each row of the coefficient matrix has uniformly random support of size $k$. This allows us to deduce the h…
▽ More
We present a polynomial-time reduction from solving noisy linear equations over $\mathbb{Z}/q\mathbb{Z}$ in dimension $Θ(k\log n/\mathsf{poly}(\log k,\log q,\log\log n))$ with a uniformly random coefficient matrix to noisy linear equations over $\mathbb{Z}/q\mathbb{Z}$ in dimension $n$ where each row of the coefficient matrix has uniformly random support of size $k$. This allows us to deduce the hardness of sparse problems from their dense counterparts. In particular, we derive hardness results in the following canonical settings. 1) Assuming the $\ell$-dimensional (dense) LWE over a polynomial-size field takes time $2^{Ω(\ell)}$, $k$-sparse LWE in dimension $n$ takes time $n^{Ω({k}/{(\log k \cdot (\log k + \log \log n))})}.$ 2) Assuming the $\ell$-dimensional (dense) LPN over $\mathbb{F}_2$ takes time $2^{Ω(\ell/\log \ell)}$, $k$-sparse LPN in dimension $n$ takes time $n^{Ω(k/(\log k \cdot (\log k + \log \log n)^2))}~.$ These running time lower bounds are nearly tight as both sparse problems can be solved in time $n^{O(k)},$ given sufficiently many samples. We further give a reduction from $k$-sparse LWE to noisy tensor completion. Concretely, composing the two reductions implies that order-$k$ rank-$2^{k-1}$ noisy tensor completion in $\mathbb{R}^{n^{\otimes k}}$ takes time $n^{Ω(k/ \log k \cdot (\log k + \log \log n))}$, assuming the exponential hardness of standard worst-case lattice problems.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications
Authors:
Ilias Diakonikolas,
Samuel B. Hopkins,
Ankit Pensia,
Stefan Tiegel
Abstract:
We prove that there is a universal constant $C>0$ so that for every $d \in \mathbb N$, every centered subgaussian distribution $\mathcal D$ on $\mathbb R^d$, and every even $p \in \mathbb N$, the $d$-variate polynomial $(Cp)^{p/2} \cdot \|v\|_{2}^p - \mathbb E_{X \sim \mathcal D} \langle v,X\rangle^p$ is a sum of square polynomials. This establishes that every subgaussian distribution is \emph{SoS…
▽ More
We prove that there is a universal constant $C>0$ so that for every $d \in \mathbb N$, every centered subgaussian distribution $\mathcal D$ on $\mathbb R^d$, and every even $p \in \mathbb N$, the $d$-variate polynomial $(Cp)^{p/2} \cdot \|v\|_{2}^p - \mathbb E_{X \sim \mathcal D} \langle v,X\rangle^p$ is a sum of square polynomials. This establishes that every subgaussian distribution is \emph{SoS-certifiably subgaussian} -- a condition that yields efficient learning algorithms for a wide variety of high-dimensional statistical tasks. As a direct corollary, we obtain computationally efficient algorithms with near-optimal guarantees for the following tasks, when given samples from an arbitrary subgaussian distribution: robust mean estimation, list-decodable mean estimation, clustering mean-separated mixture models, robust covariance-aware mean estimation, robust covariance estimation, and robust linear regression. Our proof makes essential use of Talagrand's generic chaining/majorizing measures theorem.
△ Less
Submitted 28 October, 2024;
originally announced October 2024.
-
Robust Mixture Learning when Outliers Overwhelm Small Groups
Authors:
Daniil Dmitriev,
Rares-Darius Buhai,
Stefan Tiegel,
Alexander Wolters,
Gleb Novikov,
Amartya Sanyal,
David Steurer,
Fanny Yang
Abstract:
We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, much less is known when outliers may crowd out low-weight clusters - a setting we refer to as list-decodable mixture learning (LD-ML). In this case, adversarial…
▽ More
We study the problem of estimating the means of well-separated mixtures when an adversary may add arbitrary outliers. While strong guarantees are available when the outlier fraction is significantly smaller than the minimum mixing weight, much less is known when outliers may crowd out low-weight clusters - a setting we refer to as list-decodable mixture learning (LD-ML). In this case, adversarial outliers can simulate additional spurious mixture components. Hence, if all means of the mixture must be recovered up to a small error in the output list, the list size needs to be larger than the number of (true) components. We propose an algorithm that obtains order-optimal error guarantees for each mixture mean with a minimal list-size overhead, significantly improving upon list-decodable mean estimation, the only existing method that is applicable for LD-ML. Although improvements are observed even when the mixture is non-separated, our algorithm achieves particularly strong guarantees when the mixture is separated: it can leverage the mixture structure to partially cluster the samples before carefully iterating a base learner for list-decodable mean estimation at different scales.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Testably Learning Polynomial Threshold Functions
Authors:
Lucas Slot,
Stefan Tiegel,
Manuel Wiedmer
Abstract:
Rubinfeld & Vasilyan recently introduced the framework of testable learning as an extension of the classical agnostic model. It relaxes distributional assumptions which are difficult to verify by conditions that can be checked efficiently by a tester. The tester has to accept whenever the data truly satisfies the original assumptions, and the learner has to succeed whenever the tester accepts. We…
▽ More
Rubinfeld & Vasilyan recently introduced the framework of testable learning as an extension of the classical agnostic model. It relaxes distributional assumptions which are difficult to verify by conditions that can be checked efficiently by a tester. The tester has to accept whenever the data truly satisfies the original assumptions, and the learner has to succeed whenever the tester accepts. We focus on the setting where the tester has to accept standard Gaussian data. There, it is known that basic concept classes such as halfspaces can be learned testably with the same time complexity as in the (distribution-specific) agnostic model. In this work, we ask whether there is a price to pay for testably learning more complex concept classes. In particular, we consider polynomial threshold functions (PTFs), which naturally generalize halfspaces. We show that PTFs of arbitrary constant degree can be testably learned up to excess error $\varepsilon > 0$ in time $n^{\mathrm{poly}(1/\varepsilon)}$. This qualitatively matches the best known guarantees in the agnostic model. Our results build on a connection between testable learning and fooling. In particular, we show that distributions that approximately match at least $\mathrm{poly}(1/\varepsilon)$ moments of the standard Gaussian fool constant-degree PTFs (up to error $\varepsilon$). As a secondary result, we prove that a direct approach to show testable learning (without fooling), which was successfully used for halfspaces, cannot work for PTFs.
△ Less
Submitted 6 November, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Improved Hardness Results for Learning Intersections of Halfspaces
Authors:
Stefan Tiegel
Abstract:
We show strong (and surprisingly simple) lower bounds for weakly learning intersections of halfspaces in the improper setting. Strikingly little is known about this problem. For instance, it is not even known if there is a polynomial-time algorithm for learning the intersection of only two halfspaces. On the other hand, lower bounds based on well-established assumptions (such as approximating wors…
▽ More
We show strong (and surprisingly simple) lower bounds for weakly learning intersections of halfspaces in the improper setting. Strikingly little is known about this problem. For instance, it is not even known if there is a polynomial-time algorithm for learning the intersection of only two halfspaces. On the other hand, lower bounds based on well-established assumptions (such as approximating worst-case lattice problems or variants of Feige's 3SAT hypothesis) are only known (or are implied by existing results) for the intersection of super-logarithmically many halfspaces [KS09,KS06,DSS16]. With intersections of fewer halfspaces being only ruled out under less standard assumptions [DV21] (such as the existence of local pseudo-random generators with large stretch). We significantly narrow this gap by showing that even learning $ω(\log \log N)$ halfspaces in dimension $N$ takes super-polynomial time under standard assumptions on worst-case lattice problems (namely that SVP and SIVP are hard to approximate within polynomial factors). Further, we give unconditional hardness results in the statistical query framework. Specifically, we show that for any $k$ (even constant), learning $k$ halfspaces in dimension $N$ requires accuracy $N^{-Ω(k)}$, or exponentially many queries -- in particular ruling out SQ algorithms with polynomial accuracy for $ω(1)$ halfspaces. To the best of our knowledge this is the first unconditional hardness result for learning a super-constant number of halfspaces.
Our lower bounds are obtained in a unified way via a novel connection we make between intersections of halfspaces and the so-called parallel pancakes distribution [DKS17,BLPR19,BRST21] that has been at the heart of many lower bound constructions in (robust) high-dimensional statistics in the past few years.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Computational-Statistical Gaps for Improper Learning in Sparse Linear Regression
Authors:
Rares-Darius Buhai,
Jingqiu Ding,
Stefan Tiegel
Abstract:
We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples…
▽ More
We study computational-statistical gaps for improper learning in sparse linear regression. More specifically, given $n$ samples from a $k$-sparse linear model in dimension $d$, we ask what is the minimum sample complexity to efficiently (in time polynomial in $d$, $k$, and $n$) find a potentially dense estimate for the regression vector that achieves non-trivial prediction error on the $n$ samples. Information-theoretically this can be achieved using $Θ(k \log (d/k))$ samples. Yet, despite its prominence in the literature, there is no polynomial-time algorithm known to achieve the same guarantees using less than $Θ(d)$ samples without additional restrictions on the model. Similarly, existing hardness results are either restricted to the proper setting, in which the estimate must be sparse as well, or only apply to specific algorithms.
We give evidence that efficient algorithms for this task require at least (roughly) $Ω(k^2)$ samples. In particular, we show that an improper learning algorithm for sparse linear regression can be used to solve sparse PCA problems (with a negative spike) in their Wishart form, in regimes in which efficient algorithms are widely believed to require at least $Ω(k^2)$ samples. We complement our reduction with low-degree and statistical query lower bounds for the sparse PCA problems from which we reduce.
Our hardness results apply to the (correlated) random design setting in which the covariates are drawn i.i.d. from a mean-zero Gaussian distribution with unknown covariance.
△ Less
Submitted 25 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Robust Mean Estimation Without Moments for Symmetric Distributions
Authors:
Gleb Novikov,
David Steurer,
Stefan Tiegel
Abstract:
We study the problem of robustly estimating the mean or location parameter without moment assumptions. We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently. The distributions we study include products of arbitrary symmetric one-dimensional distributions, such as product Cauchy distributions, as well as elliptical distribut…
▽ More
We study the problem of robustly estimating the mean or location parameter without moment assumptions. We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently. The distributions we study include products of arbitrary symmetric one-dimensional distributions, such as product Cauchy distributions, as well as elliptical distributions.
For product distributions and elliptical distributions with known scatter (covariance) matrix, we show that given an $\varepsilon$-corrupted sample, we can with probability at least $1-δ$ estimate its location up to error $O(\varepsilon \sqrt{\log(1/\varepsilon)})$ using $\tfrac{d\log(d) + \log(1/δ)}{\varepsilon^2 \log(1/\varepsilon)}$ samples. This result matches the best-known guarantees for the Gaussian distribution and known SQ lower bounds (up to the $\log(d)$ factor). For elliptical distributions with unknown scatter (covariance) matrix, we propose a sequence of efficient algorithms that approaches this optimal error. Specifically, for every $k \in \mathbb{N}$, we design an estimator using time and samples $\tilde{O}({d^k})$ achieving error $O(\varepsilon^{1-\frac{1}{2k}})$. This matches the error and running time guarantees when assuming certifiably bounded moments of order up to $k$. For unknown covariance, such error bounds of $o(\sqrt{\varepsilon})$ are not even known for (general) sub-Gaussian distributions.
Our algorithms are based on a generalization of the well-known filtering technique. We show how this machinery can be combined with Huber-loss-based techniques to work with projections of the noise that behave more nicely than the initial noise. Moreover, we show how SoS proofs can be used to obtain algorithmic guarantees even for distributions without a first moment. We believe that this approach may find other applications in future works.
△ Less
Submitted 8 November, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Private estimation algorithms for stochastic block models and mixture models
Authors:
Hongjie Chen,
Vincent Cohen-Addad,
Tommaso d'Orsi,
Alessandro Epasto,
Jacob Imola,
David Steurer,
Stefan Tiegel
Abstract:
We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. To illustrate our techniques, we consider two problems: recovery of stochastic block models and learning mixtures of spherical Gaussians. For the former, we present the first efficient $(ε, δ)$-…
▽ More
We introduce general tools for designing efficient private estimation algorithms, in the high-dimensional settings, whose statistical guarantees almost match those of the best known non-private algorithms. To illustrate our techniques, we consider two problems: recovery of stochastic block models and learning mixtures of spherical Gaussians. For the former, we present the first efficient $(ε, δ)$-differentially private algorithm for both weak recovery and exact recovery. Previously known algorithms achieving comparable guarantees required quasi-polynomial time. For the latter, we design an $(ε, δ)$-differentially private algorithm that recovers the centers of the $k$-mixture when the minimum separation is at least $ O(k^{1/t}\sqrt{t})$. For all choices of $t$, this algorithm requires sample complexity $n\geq k^{O(1)}d^{O(t)}$ and time complexity $(nd)^{O(t)}$. Prior work required minimum separation at least $O(\sqrt{k})$ as well as an explicit upper bound on the Euclidean norm of the centers.
△ Less
Submitted 15 November, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems
Authors:
Stefan Tiegel
Abstract:
We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, such as GapSVP or SIVP, are hard. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achi…
▽ More
We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, such as GapSVP or SIVP, are hard. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - γ$ even if the optimal misclassification error is as small is as small as $δ$. Here, $γ$ can be smaller than the inverse of any polynomial in the dimension and $δ$ as small as $exp(-Ω(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $β> 0$ learning halfspaces up to error $OPT_{LTF} + ε$ takes time at least $d^{\tildeΩ(1/ε^{2-β})}$ under the same hardness assumptions. Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $OPT_{{PTF}_\ell} + ε$ takes time at least $d^{\tildeΩ(\ell^{2-β}/ε^{2-β})}$. $OPT_{LTF}$ and $OPT_{{PTF}_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively.
Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions. Previously, such hardness results [Daniely16, DKPZ21] were based on average-case complexity assumptions or restricted to the statistical query model. Our work gives the first hardness results basing these fundamental learning problems on worst-case complexity assumptions. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.
△ Less
Submitted 20 February, 2023; v1 submitted 28 July, 2022;
originally announced July 2022.
-
Fast algorithm for overcomplete order-3 tensor decomposition
Authors:
Jingqiu Ding,
Tommaso d'Orsi,
Chih-Hung Liu,
Stefan Tiegel,
David Steurer
Abstract:
We develop the first fast spectral algorithm to decompose a random third-order tensor over $\mathbb{R}^d$ of rank up to $O(d^{3/2}/\text{polylog}(d))$. Our algorithm only involves simple linear algebra operations and can recover all components in time $O(d^{6.05})$ under the current matrix multiplication time.
Prior to this work, comparable guarantees could only be achieved via sum-of-squares [M…
▽ More
We develop the first fast spectral algorithm to decompose a random third-order tensor over $\mathbb{R}^d$ of rank up to $O(d^{3/2}/\text{polylog}(d))$. Our algorithm only involves simple linear algebra operations and can recover all components in time $O(d^{6.05})$ under the current matrix multiplication time.
Prior to this work, comparable guarantees could only be achieved via sum-of-squares [Ma, Shi, Steurer 2016]. In contrast, fast algorithms [Hopkins, Schramm, Shi, Steurer 2016] could only decompose tensors of rank at most $O(d^{4/3}/\text{polylog}(d))$.
Our algorithmic result rests on two key ingredients. A clean lifting of the third-order tensor to a sixth-order tensor, which can be expressed in the language of tensor networks. A careful decomposition of the tensor network into a sequence of rectangular matrix multiplications, which allows us to have a fast implementation of the algorithm.
△ Less
Submitted 28 June, 2022; v1 submitted 13 February, 2022;
originally announced February 2022.
-
Optimal SQ Lower Bounds for Learning Halfspaces with Massart Noise
Authors:
Rajai Nasser,
Stefan Tiegel
Abstract:
We give tight statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise. In particular, suppose that all labels are corrupted with probability at most $η$. We show that for arbitrary $η\in [0,1/2]$ every SQ algorithm achieving misclassification error better than $η$ requires queries of superpolynomial accuracy or at least a superpolynomial number of queries. Fu…
▽ More
We give tight statistical query (SQ) lower bounds for learnining halfspaces in the presence of Massart noise. In particular, suppose that all labels are corrupted with probability at most $η$. We show that for arbitrary $η\in [0,1/2]$ every SQ algorithm achieving misclassification error better than $η$ requires queries of superpolynomial accuracy or at least a superpolynomial number of queries. Further, this continues to hold even if the information-theoretically optimal error $\mathrm{OPT}$ is as small as $\exp\left(-\log^c(d)\right)$, where $d$ is the dimension and $0 < c < 1$ is an arbitrary absolute constant, and an overwhelming fraction of examples are noiseless. Our lower bound matches known polynomial time algorithms, which are also implementable in the SQ framework. Previously, such lower bounds only ruled out algorithms achieving error $\mathrm{OPT} + ε$ or error better than $Ω(η)$ or, if $η$ is close to $1/2$, error $η- o_η(1)$, where the term $o_η(1)$ is constant in $d$ but going to 0 for $η$ approaching $1/2$.
As a consequence, we also show that achieving misclassification error better than $1/2$ in the $(A,α)$-Tsybakov model is SQ-hard for $A$ constant and $α$ bounded away from 1.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Consistent Estimation for PCA and Sparse Regression with Oblivious Outliers
Authors:
Tommaso d'Orsi,
Chih-Hung Liu,
Rajai Nasser,
Gleb Novikov,
David Steurer,
Stefan Tiegel
Abstract:
We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $α$ fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, w…
▽ More
We develop machinery to design efficiently computable and consistent estimators, achieving estimation error approaching zero as the number of observations grows, when facing an oblivious adversary that may corrupt responses in all but an $α$ fraction of the samples. As concrete examples, we investigate two problems: sparse regression and principal component analysis (PCA). For sparse regression, we achieve consistency for optimal sample size $n\gtrsim (k\log d)/α^2$ and optimal error rate $O(\sqrt{(k\log d)/(n\cdot α^2)})$ where $n$ is the number of observations, $d$ is the number of dimensions and $k$ is the sparsity of the parameter vector, allowing the fraction of inliers to be inverse-polynomial in the number of samples. Prior to this work, no estimator was known to be consistent when the fraction of inliers $α$ is $o(1/\log \log n)$, even for (non-spherical) Gaussian design matrices. Results holding under weak design assumptions and in the presence of such general noise have only been shown in dense setting (i.e., general linear regression) very recently by d'Orsi et al. [dNS21]. In the context of PCA, we attain optimal error guarantees under broad spikiness assumptions on the parameter matrix (usually used in matrix completion). Previous works could obtain non-trivial guarantees only under the assumptions that the measurement noise corresponding to the inliers is polynomially small in $n$ (e.g., Gaussian with variance $1/n^2$).
To devise our estimators, we equip the Huber loss with non-smooth regularizers such as the $\ell_1$ norm or the nuclear norm, and extend d'Orsi et al.'s approach [dNS21] in a novel way to analyze the loss function. Our machinery appears to be easily applicable to a wide range of estimation problems.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
SoS Degree Reduction with Applications to Clustering and Robust Moment Estimation
Authors:
David Steurer,
Stefan Tiegel
Abstract:
We develop a general framework to significantly reduce the degree of sum-of-squares proofs by introducing new variables. To illustrate the power of this framework, we use it to speed up previous algorithms based on sum-of-squares for two important estimation problems, clustering and robust moment estimation. The resulting algorithms offer the same statistical guarantees as the previous best algori…
▽ More
We develop a general framework to significantly reduce the degree of sum-of-squares proofs by introducing new variables. To illustrate the power of this framework, we use it to speed up previous algorithms based on sum-of-squares for two important estimation problems, clustering and robust moment estimation. The resulting algorithms offer the same statistical guarantees as the previous best algorithms but have significantly faster running times. Roughly speaking, given a sample of $n$ points in dimension $d$, our algorithms can exploit order-$\ell$ moments in time $d^{O(\ell)}\cdot n^{O(1)}$, whereas a naive implementation requires time $(d\cdot n)^{O(\ell)}$. Since for the aforementioned applications, the typical sample size is $d^{Θ(\ell)}$, our framework improves running times from $d^{O(\ell^2)}$ to $d^{O(\ell)}$.
△ Less
Submitted 5 January, 2021;
originally announced January 2021.
-
A Framework for Searching in Graphs in the Presence of Errors
Authors:
Dariusz Dereniowski,
Stefan Tiegel,
Przemysław Uznański,
Daniel Wolleb-Graf
Abstract:
We consider the problem of searching for an unknown target vertex $t$ in a (possibly edge-weighted) graph. Each \emph{vertex-query} points to a vertex $v$ and the response either admits $v$ is the target or provides any neighbor $s\not=v$ that lies on a shortest path from $v$ to $t$. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et…
▽ More
We consider the problem of searching for an unknown target vertex $t$ in a (possibly edge-weighted) graph. Each \emph{vertex-query} points to a vertex $v$ and the response either admits $v$ is the target or provides any neighbor $s\not=v$ that lies on a shortest path from $v$ to $t$. This model has been introduced for trees by Onak and Parys [FOCS 2006] and for general graphs by Emamjomeh-Zadeh et al. [STOC 2016]. In the latter, the authors provide algorithms for the error-less case and for the independent noise model (where each query independently receives an erroneous answer with known probability $p<1/2$ and a correct one with probability $1-p$).
We study this problem in both adversarial errors and independent noise models. First, we show an algorithm that needs $\frac{\log_2 n}{1 - H(r)}$ queries against \emph{adversarial} errors, where adversary is bounded with its rate of errors by a known constant $r<1/2$. Our algorithm is in fact a simplification of previous work, and our refinement lies in invoking amortization argument. We then show that our algorithm coupled with Chernoff bound argument leads to an algorithm for independent noise that is simpler and with a query complexity that is both simpler and asymptotically better to one of Emamjomeh-Zadeh et al. [STOC 2016].
Our approach has a wide range of applications. First, it improves and simplifies Robust Interactive Learning framework proposed by Emamjomeh-Zadeh et al. [NIPS 2017]. Secondly, performing analogous analysis for \emph{edge-queries} (where query to edge $e$ returns its endpoint that is closer to target) we actually recover (as a special case) noisy binary search algorithm that is asymptotically optimal, matching the complexity of Feige et al. [SIAM J. Comput. 1994]. Thirdly, we improve and simplify upon existing algorithm for searching of \emph{unbounded} domains due to Aslam and Dhagat [STOC 1991].
△ Less
Submitted 5 March, 2020; v1 submitted 5 April, 2018;
originally announced April 2018.