-
Sum-Of-Squares To Approximate Knapsack
Authors:
Pravesh K. Kothari,
Sherry Sarkar
Abstract:
These notes give a self-contained exposition of Karlin, Mathieu and Nguyen's tight estimate of the integrality gap of the sum-of-squares semidefinite program for solving the knapsack problem. They are based on a sequence of three lectures in CMU course on Advanced Approximation Algorithms in Fall'21 that used the KMN result to introduce the Sum-of-Squares method for algorithm design. The treatment…
▽ More
These notes give a self-contained exposition of Karlin, Mathieu and Nguyen's tight estimate of the integrality gap of the sum-of-squares semidefinite program for solving the knapsack problem. They are based on a sequence of three lectures in CMU course on Advanced Approximation Algorithms in Fall'21 that used the KMN result to introduce the Sum-of-Squares method for algorithm design. The treatment in these notes uses the pseudo-distribution view of solutions to the sum-of-squares SDPs and only rely on a few basic, reusable results about pseudo-distributions.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Improved Lower Bounds for all Odd-Query Locally Decodable Codes
Authors:
Arpon Basu,
Jun-Ting Hsieh,
Pravesh K. Kothari,
Andrew D. Lin
Abstract:
We prove that for every odd $q\geq 3$, any $q$-query binary, possibly non-linear locally decodable code ($q$-LDC) $E:\{\pm1\}^k \rightarrow \{\pm1\}^n$ must satisfy $k \leq \tilde{O}(n^{1-2/q})$. For even $q$, this bound was established in a sequence of prior works. For $q=3$, the above bound was achieved in a recent work of Alrabiah, Guruswami, Kothari and Manohar using an argument that crucially…
▽ More
We prove that for every odd $q\geq 3$, any $q$-query binary, possibly non-linear locally decodable code ($q$-LDC) $E:\{\pm1\}^k \rightarrow \{\pm1\}^n$ must satisfy $k \leq \tilde{O}(n^{1-2/q})$. For even $q$, this bound was established in a sequence of prior works. For $q=3$, the above bound was achieved in a recent work of Alrabiah, Guruswami, Kothari and Manohar using an argument that crucially exploits known exponential lower bounds for $2$-LDCs. Their strategy hits an inherent bottleneck for $q \geq 5$.
Our key insight is identifying a general sufficient condition on the hypergraph of local decoding sets called $t$-approximate strong regularity. This condition demands that 1) the number of hyperedges containing any given subset of vertices of size $t$ (i.e., its co-degree) be equal to the same but arbitrary value $d_t$ up to a multiplicative constant slack, and 2) all other co-degrees be upper-bounded relative to $d_t$. This condition significantly generalizes related proposals in prior works that demand absolute upper bounds on all co-degrees.
We give an argument based on spectral bounds on Kikuchi Matrices that lower bounds the blocklength of any LDC whose local decoding sets satisfy $t$-approximate strong regularity for any $t \leq q$. Crucially, unlike prior works, our argument works despite having no non-trivial absolute upper bound on the co-degrees of any set of vertices. To apply our argument to arbitrary $q$-LDCs, we give a new, greedy, approximate strong regularity decomposition that shows that arbitrary, dense enough hypergraphs can be partitioned (up to a small error) into approximately strongly regular pieces satisfying the required relative bounds on the co-degrees.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Overcomplete Tensor Decomposition via Koszul-Young Flattenings
Authors:
Pravesh K. Kothari,
Ankur Moitra,
Alexander S. Wein
Abstract:
Motivated by connections between algebraic complexity lower bounds and tensor decompositions, we investigate Koszul-Young flattenings, which are the main ingredient in recent lower bounds for matrix multiplication. Based on this tool we give a new algorithm for decomposing an $n_1 \times n_2 \times n_3$ tensor as the sum of a minimal number of rank-1 terms, and certifying uniqueness of this decomp…
▽ More
Motivated by connections between algebraic complexity lower bounds and tensor decompositions, we investigate Koszul-Young flattenings, which are the main ingredient in recent lower bounds for matrix multiplication. Based on this tool we give a new algorithm for decomposing an $n_1 \times n_2 \times n_3$ tensor as the sum of a minimal number of rank-1 terms, and certifying uniqueness of this decomposition. For $n_1 \le n_2 \le n_3$ with $n_1 \to \infty$ and $n_3/n_2 = O(1)$, our algorithm is guaranteed to succeed when the tensor rank is bounded by $r \le (1-ε)(n_2 + n_3)$ for an arbitrary $ε> 0$, provided the tensor components are generically chosen. For any fixed $ε$, the runtime is polynomial in $n_3$. When $n_2 = n_3 = n$, our condition on the rank gives a factor-of-2 improvement over the classical simultaneous diagonalization algorithm, which requires $r \le n$, and also improves on the recent algorithm of Koiran (2024) which requires $r \le 4n/3$. It also improves on the PhD thesis of Persu (2018) which solves rank detection for $r \leq 3n/2$.
We complement our upper bounds by showing limitations, in particular that no flattening of the style we consider can surpass rank $n_2 + n_3$. Furthermore, for $n \times n \times n$ tensors, we show that an even more general class of degree-$d$ polynomial flattenings cannot surpass rank $Cn$ for a constant $C = C(d)$. This suggests that for tensor decompositions, the case of generic components may be fundamentally harder than that of random components, where efficient decomposition is possible even in highly overcomplete settings.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures
Authors:
Prashanti Anderson,
Mitali Bafna,
Rares-Darius Buhai,
Pravesh K. Kothari,
David Steurer
Abstract:
We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of…
▽ More
We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04] (in addition to several other applications).
As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ time and samples for clustering such mixtures.
Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query lower bound [DKS17] for clustering non-spherical Gaussian mixtures. While this result is usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.
△ Less
Submitted 19 November, 2024;
originally announced November 2024.
-
Rounding Large Independent Sets on Expanders
Authors:
Mitali Bafna,
Jun-Ting Hsieh,
Pravesh K. Kothari
Abstract:
We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has its second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an inde…
▽ More
We develop a new approach for approximating large independent sets when the input graph is a one-sided spectral expander - that is, the uniform random walk matrix of the graph has its second eigenvalue bounded away from 1. Consequently, we obtain a polynomial time algorithm to find linear-sized independent sets in one-sided expanders that are almost $3$-colorable or are promised to contain an independent set of size $(1/2-ε)n$. Our second result above can be refined to require only a weaker vertex expansion property with an efficient certificate. In a surprising contrast to our algorithmic result, we observe that the analogous task of finding a linear-sized independent set in almost $4$-colorable one-sided expanders (even when the second eigenvalue is $o_n(1)$) is NP-hard, assuming the Unique Games Conjecture.
All prior algorithms that beat the worst-case guarantees for this problem rely on bottom eigenspace enumeration techniques (following the classical spectral methods of Alon and Kahale) and require two-sided expansion, meaning a bounded number of negative eigenvalues of magnitude $Ω(1)$. Such techniques naturally extend to almost $k$-colorable graphs for any constant $k$, in contrast to analogous guarantees on one-sided expanders, which are Unique Games-hard to achieve for $k \geq 4$.
Our rounding builds on the method of simulating multiple samples from a pseudo-distribution introduced by Bafna et. al. for rounding Unique Games instances. The key to our analysis is a new clustering property of large independent sets in expanding graphs - every large independent set has a larger-than-expected intersection with some member of a small list - and its formalization in the low-degree sum-of-squares proof system.
△ Less
Submitted 5 November, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Semirandom Planted Clique and the Restricted Isometry Property
Authors:
Jarosław Błasiok,
Rares-Darius Buhai,
Pravesh K. Kothari,
David Steurer
Abstract:
We give a simple, greedy $O(n^{ω+0.5})=O(n^{2.872})$-time algorithm to list-decode planted cliques in a semirandom model introduced in [CSV17] (following [FK01]) that succeeds whenever the size of the planted clique is $k\geq O(\sqrt{n} \log^2 n)$. In the model, the edges touching the vertices in the planted $k$-clique are drawn independently with probability $p=1/2$ while the edges not touching t…
▽ More
We give a simple, greedy $O(n^{ω+0.5})=O(n^{2.872})$-time algorithm to list-decode planted cliques in a semirandom model introduced in [CSV17] (following [FK01]) that succeeds whenever the size of the planted clique is $k\geq O(\sqrt{n} \log^2 n)$. In the model, the edges touching the vertices in the planted $k$-clique are drawn independently with probability $p=1/2$ while the edges not touching the planted clique are chosen by an adversary in response to the random choices. Our result shows that the computational threshold in the semirandom setting is within a $O(\log^2 n)$ factor of the information-theoretic one [Ste17] thus resolving an open question of Steinhardt. This threshold also essentially matches the conjectured computational threshold for the well-studied special case of fully random planted clique.
All previous algorithms [CSV17, MMT20, BKS23] in this model are based on rather sophisticated rounding algorithms for entropy-constrained semidefinite programming relaxations and their sum-of-squares strengthenings and the best known guarantee is a $n^{O(1/ε)}$-time algorithm to list-decode planted cliques of size $k \geq \tilde{O}(n^{1/2+ε})$. In particular, the guarantee trivializes to quasi-polynomial time if the planted clique is of size $O(\sqrt{n} \operatorname{polylog} n)$. Our algorithm achieves an almost optimal guarantee with a surprisingly simple greedy algorithm.
The prior state-of-the-art algorithmic result above is based on a reduction to certifying bounds on the size of unbalanced bicliques in random graphs -- closely related to certifying the restricted isometry property (RIP) of certain random matrices and known to be hard in the low-degree polynomial model. Our key idea is a new approach that relies on the truth of -- but not efficient certificates for -- RIP of a new class of matrices built from the input graphs.
△ Less
Submitted 9 October, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Exponential Lower Bounds for Smooth 3-LCCs and Sharp Bounds for Designs
Authors:
Pravesh K. Kothari,
Peter Manohar
Abstract:
We give improved lower bounds for binary $3$-query locally correctable codes (3-LCCs) $C \colon \{0,1\}^k \rightarrow \{0,1\}^n$. Specifically, we prove:
(1) If $C$ is a linear design 3-LCC, then $n \geq 2^{(1 - o(1))\sqrt{k} }$. A design 3-LCC has the additional property that the correcting sets for every codeword bit form a perfect matching and every pair of codeword bits is queried an equal n…
▽ More
We give improved lower bounds for binary $3$-query locally correctable codes (3-LCCs) $C \colon \{0,1\}^k \rightarrow \{0,1\}^n$. Specifically, we prove:
(1) If $C$ is a linear design 3-LCC, then $n \geq 2^{(1 - o(1))\sqrt{k} }$. A design 3-LCC has the additional property that the correcting sets for every codeword bit form a perfect matching and every pair of codeword bits is queried an equal number of times across all matchings. Our bound is tight up to a factor $\sqrt{8}$ in the exponent of $2$, as the best construction of binary $3$-LCCs (obtained by taking Reed-Muller codes on $\mathbb{F}_4$ and applying a natural projection map) is a design $3$-LCC with $n \leq 2^{\sqrt{8 k}}$. Up to a $\sqrt{8}$ factor, this resolves the Hamada conjecture on the maximum $\mathbb{F}_2$-codimension of a $4$-design.
(2) If $C$ is a smooth, non-linear, adaptive $3$-LCC with perfect completeness, then, $n \geq 2^{Ω(k^{1/5})}$.
(3) If $C$ is a smooth, non-linear, adaptive $3$-LCC with completeness $1 - \varepsilon$, then $n \geq \tildeΩ(k^{\frac{1}{2\varepsilon}})$. In particular, when $\varepsilon$ is a small constant, this implies a lower bound for general non-linear LCCs that beats the prior best $n \geq \tildeΩ(k^3)$ lower bound of [AGKM23] by a polynomial factor.
Our design LCC lower bound is obtained via a fine-grained analysis of the Kikuchi matrix method applied to a variant of the matrix used in [KM23]. Our lower bounds for non-linear codes are obtained by designing a from-scratch reduction from nonlinear $3$-LCCs to a system of "chain XOR equations": polynomial equations with similar structure to the long chain derivations that arise in the lower bounds for linear $3$-LCCs [KM23].
△ Less
Submitted 28 October, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Small Even Covers, Locally Decodable Codes and Restricted Subgraphs of Edge-Colored Kikuchi Graphs
Authors:
Jun-Ting Hsieh,
Pravesh K. Kothari,
Sidhanth Mohanty,
David Munhá Correia,
Benny Sudakov
Abstract:
Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refutin…
▽ More
Given a $k$-uniform hypergraph $H$ on $n$ vertices, an even cover in $H$ is a collection of hyperedges that touch each vertex an even number of times. Even covers are a generalization of cycles in graphs and are equivalent to linearly dependent subsets of a system of linear equations modulo $2$. As a result, they arise naturally in the context of well-studied questions in coding theory and refuting unsatisfiable $k$-SAT formulas. Analogous to the irregular Moore bound of Alon, Hoory, and Linial (2002), in 2008, Feige conjectured an extremal trade-off between the number of hyperedges and the length of the smallest even cover in a $k$-uniform hypergraph. This conjecture was recently settled up to a multiplicative logarithmic factor in the number of hyperedges (Guruswami, Kothari, and Manohar 2022 and Hsieh, Kothari, and Mohanty 2023). These works introduce the new technique that relates hypergraph even covers to cycles in the associated Kikuchi graphs. Their analysis of these Kikuchi graphs, especially for odd $k$, is rather involved and relies on matrix concentration inequalities.
In this work, we give a simple and purely combinatorial argument that recovers the best-known bound for Feige's conjecture for even $k$. We also introduce a novel variant of a Kikuchi graph which together with this argument improves the logarithmic factor in the best-known bounds for odd $k$. As an application of our ideas, we also give a purely combinatorial proof of the improved lower bounds (Alrabiah, Guruswami, Kothari and Manohar, 2023) on 3-query binary linear locally decodable codes.
△ Less
Submitted 25 November, 2024; v1 submitted 21 January, 2024;
originally announced January 2024.
-
An Exponential Lower Bound for Linear 3-Query Locally Correctable Codes
Authors:
Pravesh K. Kothari,
Peter Manohar
Abstract:
We prove that the blocklength $n$ of a linear $3$-query locally correctable code (LCC) $\mathcal{L} \colon {\mathbb F}^k \to {\mathbb F}^n$ with distance $δ$ must be at least $n \geq 2^{Ω\left(\left(\frac{δ^2 k}{(|{\mathbb F}|-1)^2}\right)^{1/8}\right)}$. In particular, the blocklength of a linear $3$-query LCC with constant distance over any small field grows exponentially with $k$. This improves…
▽ More
We prove that the blocklength $n$ of a linear $3$-query locally correctable code (LCC) $\mathcal{L} \colon {\mathbb F}^k \to {\mathbb F}^n$ with distance $δ$ must be at least $n \geq 2^{Ω\left(\left(\frac{δ^2 k}{(|{\mathbb F}|-1)^2}\right)^{1/8}\right)}$. In particular, the blocklength of a linear $3$-query LCC with constant distance over any small field grows exponentially with $k$. This improves on the best prior lower bound of $n \geq \tildeΩ(k^3)$ [AGKM23], which holds even for the weaker setting of $3$-query locally decodable codes (LDCs), and comes close to matching the best-known construction of $3$-query LCCs based on binary Reed-Muller codes, which achieve $n \leq 2^{O(k^{1/2})}$. Because there is a $3$-query LDC with a strictly subexponential blocklength [Yek08, Efr09], as a corollary we obtain the first strong separation between $q$-query LCCs and LDCs for any constant $q \geq 3$.
Our proof is based on a new upgrade of the method of spectral refutations via Kikuchi matrices developed in recent works [GKM22, HKM23, AGKM23] that reduces establishing (non-)existence of combinatorial objects to proving unsatisfiability of associated XOR instances. Our key conceptual idea is to apply this method with XOR instances obtained via long-chain derivations, a structured variant of low-width resolution for XOR formulas from proof complexity [Gri01, Sch08].
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
New SDP Roundings and Certifiable Approximation for Cubic Optimization
Authors:
Jun-Ting Hsieh,
Pravesh K. Kothari,
Lucas Pesenti,
Luca Trevisan
Abstract:
We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves o…
▽ More
We give new rounding schemes for SDP relaxations for the problems of maximizing cubic polynomials over the unit sphere and the $n$-dimensional hypercube. In both cases, the resulting algorithms yield a $O(\sqrt{n/k})$ multiplicative approximation in $2^{O(k)} \text{poly}(n)$ time. In particular, we obtain a $O(\sqrt{n/\log n})$ approximation in polynomial time. For the unit sphere, this improves on the rounding algorithms of Bhattiprolu et. al. [BGG+17] that need quasi-polynomial time to obtain a similar approximation guarantee. Over the $n$-dimensional hypercube, our results match the guarantee of a search algorithm of Khot and Naor [KN08] that obtains a similar approximation ratio via techniques from convex geometry. Unlike their method, our algorithm obtains an upper bound on the integrality gap of SDP relaxations for the problem and as a result, also yields a certificate on the optimum value of the input instance. Our results naturally generalize to homogeneous polynomials of higher degree and imply improved algorithms for approximating satisfiable instances of Max-3SAT.
Our main motivation is the stark lack of rounding techniques for SDP relaxations of higher degree polynomial optimization in sharp contrast to a rich theory of SDP roundings for the quadratic case. Our rounding algorithms introduce two new ideas: 1) a new polynomial reweighting based method to round sum-of-squares relaxations of higher degree polynomial maximization problems, and 2) a general technique to compress such relaxations down to substantially smaller SDPs by relying on an explicit construction of certain hitting sets. We hope that our work will inspire improved rounding algorithms for polynomial optimization and related problems.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Efficient Algorithms for Semirandom Planted CSPs at the Refutation Threshold
Authors:
Venkatesan Guruswami,
Jun-Ting Hsieh,
Pravesh K. Kothari,
Peter Manohar
Abstract:
We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbi…
▽ More
We present an efficient algorithm to solve semirandom planted instances of any Boolean constraint satisfaction problem (CSP). The semirandom model is a hybrid between worst-case and average-case input models, where the input is generated by (1) choosing an arbitrary planted assignment $x^*$, (2) choosing an arbitrary clause structure, and (3) choosing literal negations for each clause from an arbitrary distribution "shifted by $x^*$" so that $x^*$ satisfies each constraint. For an $n$ variable semirandom planted instance of a $k$-arity CSP, our algorithm runs in polynomial time and outputs an assignment that satisfies all but a $o(1)$-fraction of constraints, provided that the instance has at least $\tilde{O}(n^{k/2})$ constraints. This matches, up to $polylog(n)$ factors, the clause threshold for algorithms that solve fully random planted CSPs [FPV15], as well as algorithms that refute random and semirandom CSPs [AOW15, AGK21]. Our result shows that despite having worst-case clause structure, the randomness in the literal patterns makes semirandom planted CSPs significantly easier than worst-case, where analogous results require $O(n^k)$ constraints [AKK95, FLP16].
Perhaps surprisingly, our algorithm follows a significantly different conceptual framework when compared to the recent resolution of semirandom CSP refutation. This turns out to be inherent and, at a technical level, can be attributed to the need for relative spectral approximation of certain random matrices - reminiscent of the classical spectral sparsification - which ensures that an SDP can certify the uniqueness of the planted assignment. In contrast, in the refutation setting, it suffices to obtain a weaker guarantee of absolute upper bounds on the spectral norm of related matrices.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
A Near-Cubic Lower Bound for 3-Query Locally Decodable Codes from Semirandom CSP Refutation
Authors:
Omar Alrabiah,
Venkatesan Guruswami,
Pravesh K. Kothari,
Peter Manohar
Abstract:
A code $C \colon \{0,1\}^k \to \{0,1\}^n$ is a $q$-locally decodable code ($q$-LDC) if one can recover any chosen bit $b_i$ of the message $b \in \{0,1\}^k$ with good confidence by randomly querying the encoding $x := C(b)$ on at most $q$ coordinates. Existing constructions of $2$-LDCs achieve $n = \exp(O(k))$, and lower bounds show that this is in fact tight. However, when $q = 3$, far less is kn…
▽ More
A code $C \colon \{0,1\}^k \to \{0,1\}^n$ is a $q$-locally decodable code ($q$-LDC) if one can recover any chosen bit $b_i$ of the message $b \in \{0,1\}^k$ with good confidence by randomly querying the encoding $x := C(b)$ on at most $q$ coordinates. Existing constructions of $2$-LDCs achieve $n = \exp(O(k))$, and lower bounds show that this is in fact tight. However, when $q = 3$, far less is known: the best constructions achieve $n = \exp(k^{o(1)})$, while the best known results only show a quadratic lower bound $n \geq \tildeΩ(k^2)$ on the blocklength.
In this paper, we prove a near-cubic lower bound of $n \geq \tildeΩ(k^3)$ on the blocklength of $3$-query LDCs. This improves on the best known prior works by a polynomial factor in $k$. Our proof relies on a new connection between LDCs and refuting constraint satisfaction problems with limited randomness. Our quantitative improvement builds on the new techniques for refuting semirandom instances of CSPs developed in [GKM22, HKM23] and, in particular, relies on bounding the spectral norm of appropriate Kikuchi matrices.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Ellipsoid Fitting Up to a Constant
Authors:
Jun-Ting Hsieh,
Pravesh K. Kothari,
Aaron Potechin,
Jeff Xu
Abstract:
In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the exist…
▽ More
In [Sau11,SPW13], Saunderson, Parrilo and Willsky asked the following elegant geometric question: what is the largest $m= m(d)$ such that there is an ellipsoid in $\mathbb{R}^d$ that passes through $v_1, v_2, \ldots, v_m$ with high probability when the $v_i$s are chosen independently from the standard Gaussian distribution $N(0,I_{d})$. The existence of such an ellipsoid is equivalent to the existence of a positive semidefinite matrix $X$ such that $v_i^{\top}X v_i =1$ for every $1 \leq i \leq m$ - a natural example of a random semidefinite program. SPW conjectured that $m= (1-o(1)) d^2/4$ with high probability. Very recently, Potechin, Turner, Venkat and Wein and Kane and Diakonikolas proved that $m \geq d^2/\log^{O(1)}(d)$ via certain explicit constructions.
In this work, we give a substantially tighter analysis of their construction to prove that $m \geq d^2/C$ for an absolute constant $C>0$. This resolves one direction of the SPW conjecture up to a constant. Our analysis proceeds via the method of Graphical Matrix Decomposition that has recently been used to analyze correlated random matrices arising in various areas [BHK+19]. Our key new technical tool is a refined method to prove singular value upper bounds on certain correlated random matrices that are tight up to absolute dimension-independent constants. In contrast, all previous methods that analyze such matrices lose logarithmic factors in the dimension.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Is Planted Coloring Easier than Planted Clique?
Authors:
Pravesh K. Kothari,
Santosh S. Vempala,
Alexander S. Wein,
Jeff Xu
Abstract:
We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs.
The problem of recov…
▽ More
We study the computational complexity of two related problems: recovering a planted $q$-coloring in $G(n,1/2)$, and finding efficiently verifiable witnesses of non-$q$-colorability (a.k.a. refutations) in $G(n,1/2)$. Our main results show hardness for both these problems in a restricted-but-powerful class of algorithms based on computing low-degree polynomials in the inputs.
The problem of recovering a planted $q$-coloring is equivalent to recovering $q$ disjoint planted cliques that cover all the vertices -- a potentially easier variant of the well-studied planted clique problem. Our first result shows that this variant is as hard as the original planted clique problem in the low-degree polynomial model of computation: each clique needs to have size $k \gg \sqrt{n}$ for efficient recovery to be possible. For the related variant where the cliques cover a $(1-ε)$-fraction of the vertices, we also show hardness by reduction from planted clique.
Our second result shows that refuting $q$-colorability of $G(n,1/2)$ is hard in the low-degree polynomial model when $q \gg n^{2/3}$ but easy when $q \lesssim n^{1/2}$, and we leave closing this gap for future work. Our proof is more subtle than similar results for planted clique and involves constructing a non-standard distribution over $q$-colorable graphs. We note that while related to several prior works, this is the first work that explicitly formulates refutation problems in the low-degree polynomial model.
The proofs of our main results involve showing low-degree hardness of hypothesis testing between an appropriately constructed pair of distributions. For refutation, we show completeness of this approach: in the low-degree model, the refutation task is precisely as hard as the hardest associated testing problem, i.e., proving hardness of refutation amounts to finding a "hard" distribution.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Beyond Moments: Robustly Learning Affine Transformations with Asymptotically Optimal Error
Authors:
He Jia,
Pravesh K . Kothari,
Santosh S. Vempala
Abstract:
We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an $ε$-corrupted sample from a distribution $D$ obtained by applying an unknown affine transformation $x \rightarrow Ax+s$ to the uniform distribution on a $d$-dimens…
▽ More
We present a polynomial-time algorithm for robustly learning an unknown affine transformation of the standard hypercube from samples, an important and well-studied setting for independent component analysis (ICA). Specifically, given an $ε$-corrupted sample from a distribution $D$ obtained by applying an unknown affine transformation $x \rightarrow Ax+s$ to the uniform distribution on a $d$-dimensional hypercube $[-1,1]^d$, our algorithm constructs $\hat{A}, \hat{s}$ such that the total variation distance of the distribution $\hat{D}$ from $D$ is $O(ε)$ using poly$(d)$ time and samples. Total variation distance is the information-theoretically strongest possible notion of distance in our setting and our recovery guarantees in this distance are optimal up to the absolute constant factor multiplying $ε$. In particular, if the columns of $A$ are normalized to be unit length, our total variation distance guarantee implies a bound on the sum of the $\ell_2$ distances between the column vectors of $A$ and $A'$, $\sum_{i =1}^d \|a_i-\hat{a}_i\|_2 = O(ε)$. In contrast, the strongest known prior results only yield a $ε^{O(1)}$ (relative) bound on the distance between individual $a_i$'s and their estimates and translate into an $O(dε)$ bound on the total variation distance. Our key innovation is a new approach to ICA (even to outlier-free ICA) that circumvents the difficulties in the classical method of moments and instead relies on a new geometric certificate of correctness of an affine transformation. Our algorithm is based on a new method that iteratively improves an estimate of the unknown affine transformation whenever the requirements of the certificate are not met.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
Privately Estimating a Gaussian: Efficient, Robust and Optimal
Authors:
Daniel Alabi,
Pravesh K. Kothari,
Pranay Tankala,
Prayaag Venkat,
Fred Zhang
Abstract:
In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using…
▽ More
In this work, we give efficient algorithms for privately estimating a Gaussian distribution in both pure and approximate differential privacy (DP) models with optimal dependence on the dimension in the sample complexity. In the pure DP setting, we give an efficient algorithm that estimates an unknown $d$-dimensional Gaussian distribution up to an arbitrary tiny total variation error using $\widetilde{O}(d^2 \log κ)$ samples while tolerating a constant fraction of adversarial outliers. Here, $κ$ is the condition number of the target covariance matrix. The sample bound matches best non-private estimators in the dependence on the dimension (up to a polylogarithmic factor). We prove a new lower bound on differentially private covariance estimation to show that the dependence on the condition number $κ$ in the above sample bound is also tight. Prior to our work, only identifiability results (yielding inefficient super-polynomial time algorithms) were known for the problem. In the approximate DP setting, we give an efficient algorithm to estimate an unknown Gaussian distribution up to an arbitrarily tiny total variation error using $\widetilde{O}(d^2)$ samples while tolerating a constant fraction of adversarial outliers. Prior to our work, all efficient approximate DP algorithms incurred a super-quadratic sample cost or were not outlier-robust. For the special case of mean estimation, our algorithm achieves the optimal sample complexity of $\widetilde O(d)$, improving on a $\widetilde O(d^{1.5})$ bound from prior work. Our pure DP algorithm relies on a recursive private preconditioning subroutine that utilizes the recent work on private mean estimation [Hopkins et al., 2022]. Our approximate DP algorithms are based on a substantial upgrade of the method of stabilizing convex relaxations introduced in [Kothari et al., 2022].
△ Less
Submitted 1 June, 2023; v1 submitted 15 December, 2022;
originally announced December 2022.
-
Algorithms approaching the threshold for semi-random planted clique
Authors:
Rares-Darius Buhai,
Pravesh K. Kothari,
David Steurer
Abstract:
We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least $n^{2/3}$ in a graph with $n$ vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching…
▽ More
We design new polynomial-time algorithms for recovering planted cliques in the semi-random graph model introduced by Feige and Kilian 2001. The previous best algorithms for this model succeed if the planted clique has size at least $n^{2/3}$ in a graph with $n$ vertices (Mehta, Mckenzie, Trevisan 2019 and Charikar, Steinhardt, Valiant 2017). Our algorithms work for planted-clique sizes approaching $n^{1/2}$ -- the information-theoretic threshold in the semi-random model (Steinhardt 2017) and a conjectured computational threshold even in the easier fully-random model. This result comes close to resolving open questions by Feige 2019 and Steinhardt 2017.
Our algorithms are based on higher constant degree sum-of-squares relaxation and rely on a new conceptual connection that translates certificates of upper bounds on biclique numbers in unbalanced bipartite Erdős--Rényi random graphs into algorithms for semi-random planted clique. The use of a higher-constant degree sum-of-squares is essential in our setting: we prove a lower bound on the basic SDP for certifying bicliques that shows that the basic SDP cannot succeed for planted cliques of size $k =o(n^{2/3})$. We also provide some evidence that the information-computation trade-off of our current algorithms may be inherent by proving an average-case lower bound for unbalanced bicliques in the low-degree-polynomials model.
△ Less
Submitted 6 June, 2023; v1 submitted 11 December, 2022;
originally announced December 2022.
-
A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity
Authors:
Aravind Gollakota,
Adam R. Klivans,
Pravesh K. Kothari
Abstract:
A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning ha…
▽ More
A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians.
In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions.
Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Polynomial-Time Power-Sum Decomposition of Polynomials
Authors:
Mitali Bafna,
Jun-Ting Hsieh,
Pravesh K. Kothari,
Jeff Xu
Abstract:
We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments.
Unlike tensor decomposition, both the…
▽ More
We give efficient algorithms for finding power-sum decomposition of an input polynomial $P(x)= \sum_{i\leq m} p_i(x)^d$ with component $p_i$s. The case of linear $p_i$s is equivalent to the well-studied tensor decomposition problem while the quadratic case occurs naturally in studying identifiability of non-spherical Gaussian mixtures from low-order moments.
Unlike tensor decomposition, both the unique identifiability and algorithms for this problem are not well-understood. For the simplest setting of quadratic $p_i$s and $d=3$, prior work of Ge, Huang and Kakade yields an algorithm only when $m \leq \tilde{O}(\sqrt{n})$. On the other hand, the more general recent result of Garg, Kayal and Saha builds an algebraic approach to handle any $m=n^{O(1)}$ components but only when $d$ is large enough (while yielding no bounds for $d=3$ or even $d=100$) and only handles an inverse exponential noise.
Our results obtain a substantial quantitative improvement on both the prior works above even in the base case of $d=3$ and quadratic $p_i$s. Specifically, our algorithm succeeds in decomposing a sum of $m \sim \tilde{O}(n)$ generic quadratic $p_i$s for $d=3$ and more generally the $d$th power-sum of $m \sim n^{2d/15}$ generic degree-$K$ polynomials for any $K \geq 2$. Our algorithm relies only on basic numerical linear algebraic primitives, is exact (i.e., obtain arbitrarily tiny error up to numerical precision), and handles an inverse polynomial noise when the $p_i$s have random Gaussian coefficients.
Our main tool is a new method for extracting the linear span of $p_i$s by studying the linear subspace of low-order partial derivatives of the input $P$. For establishing polynomial stability of our algorithm in average-case, we prove inverse polynomial bounds on the smallest singular value of certain correlated random matrices with low-degree polynomial entries that arise in our analyses.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
A simple and sharper proof of the hypergraph Moore bound
Authors:
Jun-Ting Hsieh,
Pravesh K. Kothari,
Sidhanth Mohanty
Abstract:
The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory an…
▽ More
The hypergraph Moore bound is an elegant statement that characterizes the extremal trade-off between the girth - the number of hyperedges in the smallest cycle or even cover (a subhypergraph with all degrees even) and size - the number of hyperedges in a hypergraph. For graphs (i.e., $2$-uniform hypergraphs), a bound tight up to the leading constant was proven in a classical work of Alon, Hoory and Linial [AHL02]. For hypergraphs of uniformity $k>2$, an appropriate generalization was conjectured by Feige [Fei08]. The conjecture was settled up to an additional $\log^{4k+1} n$ factor in the size in a recent work of Guruswami, Kothari and Manohar [GKM21]. Their argument relies on a connection between the existence of short even covers and the spectrum of a certain randomly signed Kikuchi matrix. Their analysis, especially for the case of odd $k$, is significantly complicated.
In this work, we present a substantially simpler and shorter proof of the hypergraph Moore bound. Our key idea is the use of a new reweighted Kikuchi matrix and an edge deletion step that allows us to drop several involved steps in [GKM21]'s analysis such as combinatorial bucketing of rows of the Kikuchi matrix and the use of the Schudy-Sviridenko polynomial concentration. Our simpler proof also obtains tighter parameters: in particular, the argument gives a new proof of the classical Moore bound of [AHL02] with no loss (the proof in [GKM21] loses a $\log^3 n$ factor), and loses only a single logarithmic factor for all $k>2$-uniform hypergraphs.
As in [GKM21], our ideas naturally extend to yield a simpler proof of the full trade-off for strongly refuting smoothed instances of constraint satisfaction problems with similarly improved parameters.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
List-Decodable Covariance Estimation
Authors:
Misha Ivkov,
Pravesh K. Kothari
Abstract:
We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In…
▽ More
We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In $n^{\mathsf{poly}(1/α)}$ time, it outputs a constant-size list of $k = k(α)= (1/α)^{\mathsf{poly}(1/α)}$ candidate parameters that, with high probability, contains a $(\hatμ,\hatΣ)$ such that the total variation distance $TV(\mathcal{N}(μ_*,Σ_*),\mathcal{N}(\hatμ,\hatΣ))<1-O_α(1)$. This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation for parameters with dimension independent error. Our algorithm works more generally for $(1-α)$-corruptions of any distribution $D$ that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials.
Prior to our work, the only known results for estimating covariance in the list-decodable setting were for the special cases of list-decodable linear regression and subspace recovery due to Karmarkar, Klivans, and Kothari (2019), Raghavendra and Yau (2019 and 2020) and Bakshi and Kothari (2020). These results need superpolynomial time for obtaining any subconstant error in the underlying dimension. Our result implies the first polynomial-time \emph{exact} algorithm for list-decodable linear regression and subspace recovery that allows, in particular, to obtain $2^{-\mathsf{poly}(d)}$ error in polynomial-time. Our result also implies an improved algorithm for clustering non-spherical mixtures.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Approximating Max-Cut on Bounded Degree Graphs: Tighter Analysis of the FKL Algorithm
Authors:
Jun-Ting Hsieh,
Pravesh K. Kothari
Abstract:
In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis…
▽ More
In this note, we describe a $α_{GW} + \tildeΩ(1/d^2)$-factor approximation algorithm for Max-Cut on weighted graphs of degree $\leq d$. Here, $α_{GW}\approx 0.878$ is the worst-case approximation ratio of the Goemans-Williamson rounding for Max-Cut. This improves on previous results for unweighted graphs by Feige, Karpinski, and Langberg and Florén. Our guarantee is obtained by a tighter analysis of the solution obtained by applying a natural local improvement procedure to the Goemans-Williamson rounding of the basic SDP strengthened with triangle inequalities.
△ Less
Submitted 18 June, 2022;
originally announced June 2022.
-
Bypassing the XOR Trick: Stronger Certificates for Hypergraph Clique Number
Authors:
Venkatesan Guruswami,
Pravesh K. Kothari,
Peter Manohar
Abstract:
Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $ω(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high proba…
▽ More
Let $\mathcal{H}(k,n,p)$ be the distribution on $k$-uniform hypergraphs where every subset of $[n]$ of size $k$ is included as an hyperedge with probability $p$ independently. In this work, we design and analyze a simple spectral algorithm that certifies a bound on the size of the largest clique, $ω(H)$, in hypergraphs $H \sim \mathcal{H}(k,n,p)$. For example, for any constant $p$, with high probability over the choice of the hypergraph, our spectral algorithm certifies a bound of $\tilde{O}(\sqrt{n})$ on the clique number in polynomial time. This matches, up to $\textrm{polylog}(n)$ factors, the best known certificate for the clique number in random graphs, which is the special case of $k = 2$.
Prior to our work, the best known refutation algorithms [CGL04, AOW15] rely on a reduction to the problem of refuting random $k$-XOR via Feige's XOR trick [Fei02], and yield a polynomially worse bound of $\tilde{O}(n^{3/4})$ on the clique number when $p = O(1)$. Our algorithm bypasses the XOR trick and relies instead on a natural generalization of the Lovasz theta semidefinite programming relaxation for cliques in hypergraphs.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Private Robust Estimation by Stabilizing Convex Relaxations
Authors:
Pravesh K. Kothari,
Pasin Manurangsi,
Ameya Velingker
Abstract:
We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifi…
▽ More
We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the "right affine-invariant norms": Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions.
Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new "estimate dependent" noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators.
Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Polynomial-Time Sum-of-Squares Can Robustly Estimate Mean and Covariance of Gaussians Optimally
Authors:
Pravesh K. Kothari,
Peter Manohar,
Brian Hu Zhang
Abstract:
In this work, we revisit the problem of estimating the mean and covariance of an unknown $d$-dimensional Gaussian distribution in the presence of an $\varepsilon$-fraction of adversarial outliers. The pioneering work of [DKK+16] gave a polynomial time algorithm for this task with optimal $\tilde{O}(\varepsilon)$ error using $n = \textrm{poly}(d, 1/\varepsilon)$ samples.
On the other hand, [KS17b…
▽ More
In this work, we revisit the problem of estimating the mean and covariance of an unknown $d$-dimensional Gaussian distribution in the presence of an $\varepsilon$-fraction of adversarial outliers. The pioneering work of [DKK+16] gave a polynomial time algorithm for this task with optimal $\tilde{O}(\varepsilon)$ error using $n = \textrm{poly}(d, 1/\varepsilon)$ samples.
On the other hand, [KS17b] introduced a general framework for robust moment estimation via a canonical sum-of-squares relaxation that succeeds for the more general class of certifiably subgaussian and certifiably hypercontractive [BK20] distributions. When specialized to Gaussians, this algorithm obtains the same $\tilde{O}(\varepsilon)$ error guarantee as [DKK+16] but incurs a super-polynomial sample complexity ($n = d^{O(\log(1/\varepsilon)}$) and running time ($n^{O(\log(1/\varepsilon))}$). This cost appears inherent to their analysis as it relies only on sum-of-squares certificates of upper bounds on directional moments while the analysis in [DKK+16] relies on lower bounds on directional moments inferred from algebraic relationships between moments of Gaussian distributions.
We give a new, simple analysis of the same canonical sum-of-squares relaxation used in [KS17b, BK20] and show that for Gaussian distributions, their algorithm achieves the same error, sample complexity and running time guarantees as of the specialized algorithm in [DKK+16]. Our key innovation is a new argument that allows using moment lower bounds without having sum-of-squares certificates for them. We believe that our proof technique will likely be useful in developing further robust estimation algorithms.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Algorithmic Thresholds for Refuting Random Polynomial Systems
Authors:
Jun-Ting Hsieh,
Pravesh K. Kothari
Abstract:
Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can f…
▽ More
Consider a system of $m$ polynomial equations $\{p_i(x) = b_i\}_{i \leq m}$ of degree $D\geq 2$ in $n$-dimensional variable $x \in \mathbb{R}^n$ such that each coefficient of every $p_i$ and $b_i$s are chosen at random and independently from some continuous distribution. We study the basic question of determining the smallest $m$ -- the algorithmic threshold -- for which efficient algorithms can find refutations (i.e. certificates of unsatisfiability) for such systems. This setting generalizes problems such as refuting random SAT instances, low-rank matrix sensing and certifying pseudo-randomness of Goldreich's candidate generators and generalizations.
We show that for every $d \in \mathbb{N}$, the $(n+m)^{O(d)}$-time canonical sum-of-squares (SoS) relaxation refutes such a system with high probability whenever $m \geq O(n) \cdot (\frac{n}{d})^{D-1}$. We prove a lower bound in the restricted low-degree polynomial model of computation which suggests that this trade-off between SoS degree and the number of equations is nearly tight for all $d$. We also confirm the predictions of this lower bound in a limited setting by showing a lower bound on the canonical degree-$4$ sum-of-squares relaxation for refuting random quadratic polynomials. Together, our results provide evidence for an algorithmic threshold for the problem at $m \gtrsim \widetilde{O}(n) \cdot n^{(1-δ)(D-1)}$ for $2^{n^δ}$-time algorithms for all $δ$.
△ Less
Submitted 16 October, 2021;
originally announced October 2021.
-
Algorithms and Certificates for Boolean CSP Refutation: "Smoothed is no harder than Random"
Authors:
Venkatesan Guruswami,
Pravesh K. Kothari,
Peter Manohar
Abstract:
We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an $n$-variable smoothed instance of a $k$-arity CSP, our algorithm runs in $n^{O(\ell)}$ t…
▽ More
We present an algorithm for strongly refuting smoothed instances of all Boolean CSPs. The smoothed model is a hybrid between worst and average-case input models, where the input is an arbitrary instance of the CSP with only the negation patterns of the literals re-randomized with some small probability. For an $n$-variable smoothed instance of a $k$-arity CSP, our algorithm runs in $n^{O(\ell)}$ time, and succeeds with high probability in bounding the optimum fraction of satisfiable constraints away from $1$, provided that the number of constraints is at least $\tilde{O}(n) (\frac{n}{\ell})^{\frac{k}{2} - 1}$. This matches, up to polylogarithmic factors in $n$, the trade-off between running time and the number of constraints of the state-of-the-art algorithms for refuting fully random instances of CSPs [RRS17].
We also make a surprising new connection between our algorithm and even covers in hypergraphs, which we use to positively resolve Feige's 2008 conjecture, an extremal combinatorics conjecture on the existence of even covers in sufficiently dense hypergraphs that generalizes the well-known Moore bound for the girth of graphs. As a corollary, we show that polynomial-size refutation witnesses exist for arbitrary smoothed CSP instances with number of constraints a polynomial factor below the "spectral threshold" of $n^{k/2}$, extending the celebrated result for random 3-SAT of Feige, Kim and Ofek [FKO06].
△ Less
Submitted 3 September, 2023; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Memory-Sample Lower Bounds for Learning Parity with Noise
Authors:
Sumegha Garg,
Pravesh K. Kothari,
Pengda Liu,
Ran Raz
Abstract:
In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with probability $\frac{1}{2}+\varepsilon$ and flipped with probability $\frac{1}{2}-\varepsilon$, that any learning algorithm requires either a memory of size…
▽ More
In this work, we show, for the well-studied problem of learning parity under noise, where a learner tries to learn $x=(x_1,\ldots,x_n) \in \{0,1\}^n$ from a stream of random linear equations over $\mathrm{F}_2$ that are correct with probability $\frac{1}{2}+\varepsilon$ and flipped with probability $\frac{1}{2}-\varepsilon$, that any learning algorithm requires either a memory of size $Ω(n^2/\varepsilon)$ or an exponential number of samples.
In fact, we study memory-sample lower bounds for a large class of learning problems, as characterized by [GRT'18], when the samples are noisy. A matrix $M: A \times X \rightarrow \{-1,1\}$ corresponds to the following learning problem with error parameter $\varepsilon$: an unknown element $x \in X$ is chosen uniformly at random. A learner tries to learn $x$ from a stream of samples, $(a_1, b_1), (a_2, b_2) \ldots$, where for every $i$, $a_i \in A$ is chosen uniformly at random and $b_i = M(a_i,x)$ with probability $1/2+\varepsilon$ and $b_i = -M(a_i,x)$ with probability $1/2-\varepsilon$ ($0<\varepsilon< \frac{1}{2}$). Assume that $k,\ell, r$ are such that any submatrix of $M$ of at least $2^{-k} \cdot |A|$ rows and at least $2^{-\ell} \cdot |X|$ columns, has a bias of at most $2^{-r}$. We show that any learning algorithm for the learning problem corresponding to $M$, with error, requires either a memory of size at least $Ω\left(\frac{k \cdot \ell}{\varepsilon} \right)$, or at least $2^{Ω(r)}$ samples. In particular, this shows that for a large class of learning problems, same as those in [GRT'18], any learning algorithm requires either a memory of size at least $Ω\left(\frac{(\log |X|) \cdot (\log |A|)}{\varepsilon}\right)$ or an exponential number of noisy samples.
Our proof is based on adapting the arguments in [Raz'17,GRT'18] to the noisy case.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
A Stress-Free Sum-of-Squares Lower Bound for Coloring
Authors:
Pravesh K. Kothari,
Peter Manohar
Abstract:
We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, a natural $n^{O(\varepsilon^2 \log n)}$-time, degree $O(\varepsilon^2 \log n)$ sum-of-squares semidefinite program cannot refute the existence of a valid $k$-coloring of $G$ for $k = n^{1/2 +\varepsilon}$. Our result implies that the refutation guarantee of the basic semidefinite…
▽ More
We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, a natural $n^{O(\varepsilon^2 \log n)}$-time, degree $O(\varepsilon^2 \log n)$ sum-of-squares semidefinite program cannot refute the existence of a valid $k$-coloring of $G$ for $k = n^{1/2 +\varepsilon}$. Our result implies that the refutation guarantee of the basic semidefinite program (a close variant of the Lovász theta function) cannot be appreciably improved by a natural $o(\log n)$-degree sum-of-squares strengthening, and this is tight up to a $n^{o(1)}$ slack in $k$. To the best of our knowledge, this is the first lower bound for coloring $G(n,1/2)$ for even a single round strengthening of the basic SDP in any SDP hierarchy.
Our proof relies on a new variant of instance-preserving non-pointwise complete reduction within SoS from coloring a graph to finding large independent sets in it. Our proof is (perhaps surprisingly) short, simple and does not require complicated spectral norm bounds on random matrices with dependent entries that have been otherwise necessary in the proofs of many similar results [BHK+16, HKP+17, KB19, GJJ+20, MRX20].
Our result formally holds for a constraint system where vertices are allowed to belong to multiple color classes; we leave the extension to the formally stronger formulation of coloring, where vertices must belong to unique colors classes, as an outstanding open problem.
△ Less
Submitted 16 May, 2021;
originally announced May 2021.
-
Robustly Learning Mixtures of $k$ Arbitrary Gaussians
Authors:
Ainesh Bakshi,
Ilias Diakonikolas,
He Jia,
Daniel M. Kane,
Pravesh K. Kothari,
Santosh S. Vempala
Abstract:
We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mix…
▽ More
We give a polynomial-time algorithm for the problem of robustly estimating a mixture of $k$ arbitrary Gaussians in $\mathbb{R}^d$, for any fixed $k$, in the presence of a constant fraction of arbitrary corruptions. This resolves the main open problem in several previous works on algorithmic robust statistics, which addressed the special cases of robustly estimating (a) a single Gaussian, (b) a mixture of TV-distance separated Gaussians, and (c) a uniform mixture of two Gaussians. Our main tools are an efficient \emph{partial clustering} algorithm that relies on the sum-of-squares method, and a novel \emph{tensor decomposition} algorithm that allows errors in both Frobenius norm and low-rank terms.
△ Less
Submitted 7 June, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Sparse PCA: Algorithms, Adversarial Perturbations and Certificates
Authors:
Tommaso d'Orsi,
Pravesh K. Kothari,
Gleb Novikov,
David Steurer
Abstract:
We study efficient algorithms for Sparse PCA in standard statistical models (spiked covariance in its Wishart form). Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations. Despite a long history of prior works, including explicit studies of perturbation resilience, the best known algorithmic guarantees for Sparse PCA are fragile and break down under small…
▽ More
We study efficient algorithms for Sparse PCA in standard statistical models (spiked covariance in its Wishart form). Our goal is to achieve optimal recovery guarantees while being resilient to small perturbations. Despite a long history of prior works, including explicit studies of perturbation resilience, the best known algorithmic guarantees for Sparse PCA are fragile and break down under small adversarial perturbations.
We observe a basic connection between perturbation resilience and \emph{certifying algorithms} that are based on certificates of upper bounds on sparse eigenvalues of random matrices. In contrast to other techniques, such certifying algorithms, including the brute-force maximum likelihood estimator, are automatically robust against small adversarial perturbation.
We use this connection to obtain the first polynomial-time algorithms for this problem that are resilient against additive adversarial perturbations by obtaining new efficient certificates for upper bounds on sparse eigenvalues of random matrices. Our algorithms are based either on basic semidefinite programming or on its low-degree sum-of-squares strengthening depending on the parameter regimes. Their guarantees either match or approach the best known guarantees of \emph{fragile} algorithms in terms of sparsity of the unknown vector, number of samples and the ambient dimension.
To complement our algorithmic results, we prove rigorous lower bounds matching the gap between fragile and robust polynomial-time algorithms in a natural computational model based on low-degree polynomials (closely related to the pseudo-calibration technique for sum-of-squares lower bounds) that is known to capture the best known guarantees for related statistical estimation problems. The combination of these results provides formal evidence of an inherent price to pay to achieve robustness.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Strongly refuting all semi-random Boolean CSPs
Authors:
Jackson Abascal,
Venkatesan Guruswami,
Pravesh K. Kothari
Abstract:
We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean…
▽ More
We give an efficient algorithm to strongly refute \emph{semi-random} instances of all Boolean constraint satisfaction problems. The number of constraints required by our algorithm matches (up to polylogarithmic factors) the best-known bounds for efficient refutation of fully random instances. Our main technical contribution is an algorithm to strongly refute semi-random instances of the Boolean $k$-XOR problem on $n$ variables that have $\widetilde{O}(n^{k/2})$ constraints. (In a semi-random $k$-XOR instance, the equations can be arbitrary and only the right-hand sides are random.)
One of our key insights is to identify a simple combinatorial property of random XOR instances that makes spectral refutation work. Our approach involves taking an instance that does not satisfy this property (i.e., is \emph{not} pseudorandom) and reducing it to a partitioned collection of $2$-XOR instances. We analyze these subinstances using a carefully chosen quadratic form as a proxy, which in turn is bounded via a combination of spectral methods and semidefinite programming. The analysis of our spectral bounds relies only on an off-the-shelf matrix Bernstein inequality. Even for the purely random case, this leads to a shorter proof compared to the ones in the literature that rely on problem-specific trace-moment computations.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Time-Space Tradeoffs for Distinguishing Distributions and Applications to Security of Goldreich's PRG
Authors:
Sumegha Garg,
Pravesh K. Kothari,
Ran Raz
Abstract:
In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting.
In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligibl…
▽ More
In this work, we establish lower-bounds against memory bounded algorithms for distinguishing between natural pairs of related distributions from samples that arrive in a streaming setting.
In our first result, we show that any algorithm that distinguishes between uniform distribution on $\{0,1\}^n$ and uniform distribution on an $n/2$-dimensional linear subspace of $\{0,1\}^n$ with non-negligible advantage needs $2^{Ω(n)}$ samples or $Ω(n^2)$ memory.
Our second result applies to distinguishing outputs of Goldreich's local pseudorandom generator from the uniform distribution on the output domain. Specifically, Goldreich's pseudorandom generator $G$ fixes a predicate $P:\{0,1\}^k \rightarrow \{0,1\}$ and a collection of subsets $S_1, S_2, \ldots, S_m \subseteq [n]$ of size $k$. For any seed $x \in \{0,1\}^n$, it outputs $P(x_{S_1}), P(x_{S_2}), \ldots, P(x_{S_m})$ where $x_{S_i}$ is the projection of $x$ to the coordinates in $S_i$. We prove that whenever $P$ is $t$-resilient (all non-zero Fourier coefficients of $(-1)^P$ are of degree $t$ or higher), then no algorithm, with $<n^ε$ memory, can distinguish the output of $G$ from the uniform distribution on $\{0,1\}^m$ with a large inverse polynomial advantage, for stretch $m \le \left(\frac{n}{t}\right)^{\frac{(1-ε)}{36}\cdot t}$ (barring some restrictions on $k$). The lower bound holds in the streaming model where at each time step $i$, $S_i\subseteq [n]$ is a randomly chosen (ordered) subset of size $k$ and the distinguisher sees either $P(x_{S_i})$ or a uniformly random bit along with $S_i$.
Our proof builds on the recently developed machinery for proving time-space trade-offs (Raz 2016 and follow-ups) for search/learning problems.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
List-Decodable Subspace Recovery: Dimension Independent Error in Polynomial Time
Authors:
Ainesh Bakshi,
Pravesh K. Kothari
Abstract:
In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two…
▽ More
In list-decodable subspace recovery, the input is a collection of $n$ points $αn$ (for some $α\ll 1/2$) of which are drawn i.i.d. from a distribution $\mathcal{D}$ with a isotropic rank $r$ covariance $Π_*$ (the \emph{inliers}) and the rest are arbitrary, potential adversarial outliers. The goal is to recover a $O(1/α)$ size list of candidate covariances that contains a $\hatΠ$ close to $Π_*$. Two recent independent works (Raghavendra-Yau, Bakshi-Kothari 2020) gave the first efficient algorithm for this problem. These results, however, obtain an error that grows with the dimension (linearly in [RY] and logarithmically in BK) at the cost of quasi-polynomial running time) and rely on \emph{certifiable anti-concentration} - a relatively strict condition satisfied essentially only by the Gaussian distribution.
In this work, we improve on these results on all three fronts: \emph{dimension-independent} error via a faster fixed-polynomial running time under less restrictive distributional assumptions. Specifically, we give a $poly(1/α) d^{O(1)}$ time algorithm that outputs a list containing a $\hatΠ$ satisfying $\|\hatΠ -Π_*\|_F \leq O(1/α)$. Our result only needs $\mathcal{D}$ to have \emph{certifiably hypercontractive} degree 2 polynomials. As a result, in addition to Gaussians, our algorithm applies to the uniform distribution on the hypercube and $q$-ary cubes and arbitrary product distributions with subgaussian marginals. Prior work (Raghavendra and Yau, 2020) had identified such distributions as potential hard examples as such distributions do not exhibit strong enough anti-concentration. When $\mathcal{D}$ satisfies certifiable anti-concentration, we obtain a stronger error guarantee of $\|\hatΠ-Π_*\|_F \leq η$ for any arbitrary $η> 0$ in $d^{O(poly(1/α) + \log (1/η))}$ time.
△ Less
Submitted 7 January, 2021; v1 submitted 12 February, 2020;
originally announced February 2020.
-
List-Decodable Linear Regression
Authors:
Sushrut Karmalkar,
Adam R. Klivans,
Pravesh K. Kothari
Abstract:
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples.
For any $α< 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $αn$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +ζ$ for some small noise $ζ$ and $(1-α)n$ of the equat…
▽ More
We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples.
For any $α< 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $αn$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +ζ$ for some small noise $ζ$ and $(1-α)n$ of the equations are {\em arbitrarily} chosen. It outputs a list $L$ of size $O(1/α)$ - a fixed constant - that contains an $\ell$ that is close to $\ell^*$.
Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/α)^{O(1/α^8)}$ time algorithm to find a $O(1/α)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $\ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary.
Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method.
In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.
△ Less
Submitted 30 May, 2019; v1 submitted 14 May, 2019;
originally announced May 2019.
-
On the Expressive Power of Kernel Methods and the Efficiency of Kernel Learning by Association Schemes
Authors:
Pravesh K. Kothari,
Roi Livni
Abstract:
We study the expressive power of kernel methods and the algorithmic feasibility of multiple kernel learning for a special rich class of kernels.
Specifically, we define \emph{Euclidean kernels}, a diverse class that includes most, if not all, families of kernels studied in literature such as polynomial kernels and radial basis functions. We then describe the geometric and spectral structure of t…
▽ More
We study the expressive power of kernel methods and the algorithmic feasibility of multiple kernel learning for a special rich class of kernels.
Specifically, we define \emph{Euclidean kernels}, a diverse class that includes most, if not all, families of kernels studied in literature such as polynomial kernels and radial basis functions. We then describe the geometric and spectral structure of this family of kernels over the hypercube (and to some extent for any compact domain). Our structural results allow us to prove meaningful limitations on the expressive power of the class as well as derive several efficient algorithms for learning kernels over different domains.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Sum-of-Squares meets Nash: Optimal Lower Bounds for Finding any Equilibrium
Authors:
Pravesh K. Kothari,
Ruta Mehta
Abstract:
Several works have shown unconditional hardness (via integrality gaps) of computing equilibria using strong hierarchies of convex relaxations. Such results however only apply to the problem of computing equilibria that optimize a certain objective function and not to the (arguably more fundamental) task of finding \emph{any} equilibrium.
We present an algorithmic model based on the sum-of-square…
▽ More
Several works have shown unconditional hardness (via integrality gaps) of computing equilibria using strong hierarchies of convex relaxations. Such results however only apply to the problem of computing equilibria that optimize a certain objective function and not to the (arguably more fundamental) task of finding \emph{any} equilibrium.
We present an algorithmic model based on the sum-of-squares (SoS) hierarchy that allows escaping this inherent limitation of integrality gaps. In this model, algorithms access the input game only through a relaxed solution to the natural SoS relaxation for computing equilibria. They can then adaptively construct a list of candidate solutions and invoke a verification oracle to check if any candidate on the list is a solution. This model captures most well-studied approximation algorithms such as those for Max-Cut, Sparsest Cut, and Unique-Games.
The state-of-the-art algorithms for computing exact and approximate equilibria in two-player, n-strategy games are captured in this model and require that at least one of i) size (~ running time) of the SoS relaxation or ii) the size of the list of candidates, be at least $2^{Ω(n)}$ and $n^{Ω(\log{(n)})}$ respectively. Our main result shows a lower bound that matches these upper bound up to constant factors in the exponent.
This can be interpreted as an unconditional confirmation, in our restricted algorithmic framework, of Rubinstein's recent conditional hardness \cite{Rub} for computing approximate equilibria.
Our proof strategy involves constructing a family of games that all share a common sum-of-squares solution but every (approximate) equilibrium of one game is far from every (approximate) equilibrium of any other game in the family.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Small-Set Expansion in Shortcode Graph and the 2-to-2 Conjecture
Authors:
Boaz Barak,
Pravesh K. Kothari,
David Steurer
Abstract:
Dinur, Khot, Kindler, Minzer and Safra (2016) recently showed that the (imperfect completeness variant of) Khot's 2 to 2 games conjecture follows from a combinatorial hypothesis about the soundness of a certain "Grassmanian agreement tester". In this work, we show that the hypothesis of Dinur et. al. follows from a conjecture we call the "Inverse Shortcode Hypothesis" characterizing the non-expand…
▽ More
Dinur, Khot, Kindler, Minzer and Safra (2016) recently showed that the (imperfect completeness variant of) Khot's 2 to 2 games conjecture follows from a combinatorial hypothesis about the soundness of a certain "Grassmanian agreement tester". In this work, we show that the hypothesis of Dinur et. al. follows from a conjecture we call the "Inverse Shortcode Hypothesis" characterizing the non-expanding sets of the degree-two shortcode graph. We also show the latter conjecture is equivalent to a characterization of the non-expanding sets in the Grassman graph, as hypothesized by a follow-up paper of Dinur et. al. (2017).
Following our work, Khot, Minzer and Safra (2018) proved the "Inverse Shortcode Hypothesis". Combining their proof with our result and the reduction of Dinur et. al. (2016), completes the proof of the 2 to 2 conjecture with imperfect completeness. Moreover, we believe that the shortcode graph provides a useful view of both the hypothesis and the reduction, and might be useful in extending it further.
△ Less
Submitted 23 April, 2018;
originally announced April 2018.
-
Efficient Algorithms for Outlier-Robust Regression
Authors:
Adam Klivans,
Pravesh K. Kothari,
Raghu Meka
Abstract:
We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels.
Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of th…
▽ More
We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels.
Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is \emph{certifiably hypercontractive}. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting.
These results are the first of their kind and were not known to be even information-theoretically possible prior to our work.
Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple non-convex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.
△ Less
Submitted 4 June, 2020; v1 submitted 8 March, 2018;
originally announced March 2018.
-
An Analysis of the t-SNE Algorithm for Data Visualization
Authors:
Sanjeev Arora,
Wei Hu,
Pravesh K. Kothari
Abstract:
A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de fac…
▽ More
A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction does not produce data visualizations. The t-SNE heuristic of van der Maaten and Hinton, which is based on non-convex optimization, has become the de facto standard for visualization in a wide range of applications.
This work gives a formal framework for the problem of data visualization - finding a 2-dimensional embedding of clusterable data that correctly separates individual clusters to make them visually identifiable. We then give a rigorous analysis of the performance of t-SNE under a natural, deterministic condition on the "ground-truth" clusters (similar to conditions assumed in earlier analyses of clustering) in the underlying data. These are the first provable guarantees on t-SNE for constructing good data visualizations.
We show that our deterministic condition is satisfied by considerably general probabilistic generative models for clusterable data such as mixtures of well-separated log-concave distributions. Finally, we give theoretical evidence that t-SNE provably succeeds in partially recovering cluster structure even when the above deterministic condition is not met.
△ Less
Submitted 6 June, 2018; v1 submitted 5 March, 2018;
originally announced March 2018.
-
Surprise in Elections
Authors:
Palash Dey,
Pravesh K. Kothari,
Swaprava Nath
Abstract:
Elections involving a very large voter population often lead to outcomes that surprise many. This is particularly important for the elections in which results affect the economy of a sizable population. A better prediction of the true outcome helps reduce the surprise and keeps the voters prepared. This paper starts from the basic observation that individuals in the underlying population build est…
▽ More
Elections involving a very large voter population often lead to outcomes that surprise many. This is particularly important for the elections in which results affect the economy of a sizable population. A better prediction of the true outcome helps reduce the surprise and keeps the voters prepared. This paper starts from the basic observation that individuals in the underlying population build estimates of the distribution of preferences of the whole population based on their local neighborhoods. The outcome of the election leads to a surprise if these local estimates contradict the outcome of the election for some fixed voting rule. To get a quantitative understanding, we propose a simple mathematical model of the setting where the individuals in the population and their connections (through geographical proximity, social networks etc.) are described by a random graph with connection probabilities that are biased based on the preferences of the individuals. Each individual also has some estimate of the bias in their connections.
We show that the election outcome leads to a surprise if the discrepancy between the estimated bias and the true bias in the local connections exceeds a certain threshold, and confirm the phenomenon that surprising outcomes are associated only with {\em closely contested elections}. We compare standard voting rules based on their performance on surprise and show that they have different behavior for different parts of the population. It also hints at an impossibility that a single voting rule will be less surprising for {\em all} parts of a population. Finally, we experiment with the UK-EU referendum (a.k.a.\ Brexit) dataset that attest some of our theoretical predictions.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
Outlier-robust moment-estimation via sum-of-squares
Authors:
Pravesh K. Kothari,
David Steurer
Abstract:
We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent works of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds…
▽ More
We develop efficient algorithms for estimating low-degree moments of unknown distributions in the presence of adversarial outliers. The guarantees of our algorithms improve in many cases significantly over the best previous ones, obtained in recent works of Diakonikolas et al, Lai et al, and Charikar et al. We also show that the guarantees of our algorithms match information-theoretic lower-bounds for the class of distributions we consider. These improved guarantees allow us to give improved algorithms for independent component analysis and learning mixtures of Gaussians in the presence of outliers.
Our algorithms are based on a standard sum-of-squares relaxation of the following conceptually-simple optimization problem: Among all distributions whose moments are bounded in the same way as for the unknown distribution, find the one that is closest in statistical distance to the empirical distribution of the adversarially-corrupted sample.
△ Less
Submitted 23 December, 2017; v1 submitted 30 November, 2017;
originally announced November 2017.
-
Better Agnostic Clustering Via Relaxed Tensor Norms
Authors:
Pravesh K. Kothari,
Jacob Steinhardt
Abstract:
We develop a new family of convex relaxations for $k$-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have b…
▽ More
We develop a new family of convex relaxations for $k$-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms.
We then prove a sharp upper bound on the sum-of-squares norms for moment tensors of any distribution that satisfies the \emph{Poincare inequality}. The Poincare inequality is a central inequality in probability theory, and a large class of distributions satisfy it including Gaussians, product distributions, strongly log-concave distributions, and any sum or uniformly continuous transformation of such distributions.
As an immediate corollary, for any $γ> 0$, we obtain an efficient algorithm for learning the means of a mixture of $k$ arbitrary \Poincare distributions in $\mathbb{R}^d$ in time $d^{O(1/γ)}$ so long as the means have separation $Ω(k^γ)$. This in particular yields an algorithm for learning Gaussian mixtures with separation $Ω(k^γ)$, thus partially resolving an open problem of Regev and Vijayaraghavan \citet{regev2017learning}.
Our algorithm works even in the outlier-robust setting where an $ε$ fraction of arbitrary outliers are added to the data, as long as the fraction of outliers is smaller than the smallest cluster. We, therefore, obtain results in the strong agnostic setting where, in addition to not knowing the distribution family, the data itself may be arbitrarily corrupted.
△ Less
Submitted 20 November, 2017;
originally announced November 2017.
-
The power of sum-of-squares for detecting hidden structures
Authors:
Samuel B. Hopkins,
Pravesh K. Kothari,
Aaron Potechin,
Prasad Raghavendra,
Tselil Schramm,
David Steurer
Abstract:
We study planted problems---finding hidden structures in random noisy inputs---through the lens of the sum-of-squares semidefinite programming hierarchy (SoS). This family of powerful semidefinite programs has recently yielded many new algorithms for planted problems, often achieving the best known polynomial-time guarantees in terms of accuracy of recovered solutions and robustness to noise. One…
▽ More
We study planted problems---finding hidden structures in random noisy inputs---through the lens of the sum-of-squares semidefinite programming hierarchy (SoS). This family of powerful semidefinite programs has recently yielded many new algorithms for planted problems, often achieving the best known polynomial-time guarantees in terms of accuracy of recovered solutions and robustness to noise. One theme in recent work is the design of spectral algorithms which match the guarantees of SoS algorithms for planted problems. Classical spectral algorithms are often unable to accomplish this: the twist in these new spectral algorithms is the use of spectral structure of matrices whose entries are low-degree polynomials of the input variables. We prove that for a wide class of planted problems, including refuting random constraint satisfaction problems, tensor and sparse PCA, densest-k-subgraph, community detection in stochastic block models, planted clique, and others, eigenvalues of degree-d matrix polynomials are as powerful as SoS semidefinite programs of roughly degree d. For such problems it is therefore always possible to match the guarantees of SoS without solving a large semidefinite program. Using related ideas on SoS algorithms and low-degree matrix polynomials (and inspired by recent work on SoS and the planted clique problem by Barak et al.), we prove new nearly-tight SoS lower bounds for the tensor and sparse principal component analysis problems. Our lower bounds for sparse principal component analysis are the first to suggest that going beyond existing algorithms for this problem may require sub-exponential time.
△ Less
Submitted 13 October, 2017;
originally announced October 2017.
-
Agnostic Learning by Refuting
Authors:
Pravesh K. Kothari,
Roi Livni
Abstract:
The sample complexity of learning a Boolean-valued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning.
We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample c…
▽ More
The sample complexity of learning a Boolean-valued function class is precisely characterized by its Rademacher complexity. This has little bearing, however, on the sample complexity of \emph{efficient} agnostic learning.
We introduce \emph{refutation complexity}, a natural computational analog of Rademacher complexity of a Boolean concept class and show that it exactly characterizes the sample complexity of \emph{efficient} agnostic learning. Informally, refutation complexity of a class $\mathcal{C}$ is the minimum number of example-label pairs required to efficiently distinguish between the case that the labels correlate with the evaluation of some member of $\mathcal{C}$ (\emph{structure}) and the case where the labels are i.i.d. Rademacher random variables (\emph{noise}). The easy direction of this relationship was implicitly used in the recent framework for improper PAC learning lower bounds of Daniely and co-authors via connections to the hardness of refuting random constraint satisfaction problems. Our work can be seen as making the relationship between agnostic learning and refutation implicit in their work into an explicit equivalence. In a recent, independent work, Salil Vadhan discovered a similar relationship between refutation and PAC-learning in the realizable (i.e. noiseless) case.
△ Less
Submitted 30 November, 2017; v1 submitted 12 September, 2017;
originally announced September 2017.
-
Sum of squares lower bounds for refuting any CSP
Authors:
Pravesh K. Kothari,
Ryuhei Mori,
Ryan O'Donnell,
David Witmer
Abstract:
Let $P:\{0,1\}^k \to \{0,1\}$ be a nontrivial $k$-ary predicate. Consider a random instance of the constraint satisfaction problem $\mathrm{CSP}(P)$ on $n$ variables with $Δn$ constraints, each being $P$ applied to $k$ randomly chosen literals. Provided the constraint density satisfies $Δ\gg 1$, such an instance is unsatisfiable with high probability. The \emph{refutation} problem is to efficientl…
▽ More
Let $P:\{0,1\}^k \to \{0,1\}$ be a nontrivial $k$-ary predicate. Consider a random instance of the constraint satisfaction problem $\mathrm{CSP}(P)$ on $n$ variables with $Δn$ constraints, each being $P$ applied to $k$ randomly chosen literals. Provided the constraint density satisfies $Δ\gg 1$, such an instance is unsatisfiable with high probability. The \emph{refutation} problem is to efficiently find a proof of unsatisfiability.
We show that whenever the predicate $P$ supports a $t$-\emph{wise uniform} probability distribution on its satisfying assignments, the sum of squares (SOS) algorithm of degree $d = Θ(\frac{n}{Δ^{2/(t-1)} \log Δ})$ (which runs in time $n^{O(d)}$) \emph{cannot} refute a random instance of $\mathrm{CSP}(P)$. In particular, the polynomial-time SOS algorithm requires $\widetildeΩ(n^{(t+1)/2})$ constraints to refute random instances of CSP$(P)$ when $P$ supports a $t$-wise uniform distribution on its satisfying assignments. Together with recent work of Lee et al. [LRS15], our result also implies that \emph{any} polynomial-size semidefinite programming relaxation for refutation requires at least $\widetildeΩ(n^{(t+1)/2})$ constraints.
Our results (which also extend with no change to CSPs over larger alphabets) subsume all previously known lower bounds for semialgebraic refutation of random CSPs. For every constraint predicate~$P$, they give a three-way hardness tradeoff between the density of constraints, the SOS degree (hence running time), and the strength of the refutation. By recent algorithmic results of Allen et al. [AOW15] and Raghavendra et al. [RRS16], this full three-way tradeoff is \emph{tight}, up to lower-order factors.
△ Less
Submitted 16 January, 2017;
originally announced January 2017.
-
Approximating Rectangles by Juntas and Weakly-Exponential Lower Bounds for LP Relaxations of CSPs
Authors:
Pravesh K. Kothari,
Raghu Meka,
Prasad Raghavendra
Abstract:
We show that for constraint satisfaction problems (CSPs), sub-exponential size linear programming relaxations are as powerful as $n^{Ω(1)}$-rounds of the Sherali-Adams linear programming hierarchy. As a corollary, we obtain sub-exponential size lower bounds for linear programming relaxations that beat random guessing for many CSPs such as MAX-CUT and MAX-3SAT. This is a nearly-exponential improvem…
▽ More
We show that for constraint satisfaction problems (CSPs), sub-exponential size linear programming relaxations are as powerful as $n^{Ω(1)}$-rounds of the Sherali-Adams linear programming hierarchy. As a corollary, we obtain sub-exponential size lower bounds for linear programming relaxations that beat random guessing for many CSPs such as MAX-CUT and MAX-3SAT. This is a nearly-exponential improvement over previous results, previously, it was only known that linear programs of size $n^{o(\log n)}$ cannot beat random guessing for any CSP (Chan-Lee-Raghavendra-Steurer 2013).
Our bounds are obtained by exploiting and extending the recent progress in communication complexity for "lifting" query lower bounds to communication problems. The main ingredient in our results is a new structural result on "high-entropy rectangles" that may of independent interest in communication complexity.
△ Less
Submitted 30 December, 2017; v1 submitted 9 October, 2016;
originally announced October 2016.
-
A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem
Authors:
Boaz Barak,
Samuel B. Hopkins,
Jonathan Kelner,
Pravesh K. Kothari,
Ankur Moitra,
Aaron Potechin
Abstract:
We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, the $n^{O(d)}$-time degree $d$ Sum-of-Squares semidefinite programming relaxation for the clique problem will give a value of at least $n^{1/2-c(d/\log n)^{1/2}}$ for some constant $c>0$. This yields a nearly tight $n^{1/2 - o(1)}$ bound on the value of this program for any degre…
▽ More
We prove that with high probability over the choice of a random graph $G$ from the Erdős-Rényi distribution $G(n,1/2)$, the $n^{O(d)}$-time degree $d$ Sum-of-Squares semidefinite programming relaxation for the clique problem will give a value of at least $n^{1/2-c(d/\log n)^{1/2}}$ for some constant $c>0$. This yields a nearly tight $n^{1/2 - o(1)}$ bound on the value of this program for any degree $d = o(\log n)$. Moreover we introduce a new framework that we call \emph{pseudo-calibration} to construct Sum of Squares lower bounds. This framework is inspired by taking a computational analog of Bayesian probability theory. It yields a general recipe for constructing good pseudo-distributions (i.e., dual certificates for the Sum-of-Squares semidefinite program), and sheds further light on the ways in which this hierarchy differs from others.
△ Less
Submitted 12 April, 2016; v1 submitted 11 April, 2016;
originally announced April 2016.
-
SoS and Planted Clique: Tight Analysis of MPW Moments at all Degrees and an Optimal Lower Bound at Degree Four
Authors:
Samuel B. Hopkins,
Pravesh K. Kothari,
Aaron Potechin
Abstract:
The problem of finding large cliques in random graphs and its "planted" variant, where one wants to recover a clique of size $ω\gg \log{(n)}$ added to an \Erdos-\Renyi graph $G \sim G(n,\frac{1}{2})$, have been intensely studied. Nevertheless, existing polynomial time algorithms can only recover planted cliques of size $ω= Ω(\sqrt{n})$. By contrast, information theoretically, one can recover plant…
▽ More
The problem of finding large cliques in random graphs and its "planted" variant, where one wants to recover a clique of size $ω\gg \log{(n)}$ added to an \Erdos-\Renyi graph $G \sim G(n,\frac{1}{2})$, have been intensely studied. Nevertheless, existing polynomial time algorithms can only recover planted cliques of size $ω= Ω(\sqrt{n})$. By contrast, information theoretically, one can recover planted cliques so long as $ω\gg \log{(n)}$. In this work, we continue the investigation of algorithms from the sum of squares hierarchy for solving the planted clique problem begun by Meka, Potechin, and Wigderson (MPW, 2015) and Deshpande and Montanari (DM,2015). Our main results improve upon both these previous works by showing:
1. Degree four SoS does not recover the planted clique unless $ω\gg \sqrt n poly \log n$, improving upon the bound $ω\gg n^{1/3}$ due to DM. A similar result was obtained independently by Raghavendra and Schramm (2015).
2. For $2 < d = o(\sqrt{\log{(n)}})$, degree $2d$ SoS does not recover the planted clique unless $ω\gg n^{1/(d + 1)} /(2^d poly \log n)$, improving upon the bound due to MPW.
Our proof for the second result is based on a fine spectral analysis of the certificate used in the prior works MPW,DM and Feige and Krauthgamer (2003) by decomposing it along an appropriately chosen basis. Along the way, we develop combinatorial tools to analyze the spectrum of random matrices with dependent entries and to understand the symmetries in the eigenspaces of the set symmetric matrices inspired by work of Grigoriev (2001).
An argument of Kelner shows that the first result cannot be proved using the same certificate. Rather, our proof involves constructing and analyzing a new certificate that yields the nearly tight lower bound by "correcting" the certificate of previous works.
△ Less
Submitted 18 July, 2015;
originally announced July 2015.