-
Some easy optimization problems have the overlap-gap property
Authors:
Shuangping Li,
Tselil Schramm
Abstract:
We show that the shortest $s$-$t$ path problem has the overlap-gap property in (i) sparse $\mathbf{G}(n,p)$ graphs and (ii) complete graphs with i.i.d. Exponential edge weights. Furthermore, we demonstrate that in sparse $\mathbf{G}(n,p)$ graphs, shortest path is solved by $O(\log n)$-degree polynomial estimators, and a uniform approximate shortest path can be sampled in polynomial time. This cons…
▽ More
We show that the shortest $s$-$t$ path problem has the overlap-gap property in (i) sparse $\mathbf{G}(n,p)$ graphs and (ii) complete graphs with i.i.d. Exponential edge weights. Furthermore, we demonstrate that in sparse $\mathbf{G}(n,p)$ graphs, shortest path is solved by $O(\log n)$-degree polynomial estimators, and a uniform approximate shortest path can be sampled in polynomial time. This constitutes the first example in which the overlap-gap property is not predictive of algorithmic intractability for a (non-algebraic) average-case optimization problem.
△ Less
Submitted 4 November, 2024; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Discrepancy Algorithms for the Binary Perceptron
Authors:
Shuangping Li,
Tselil Schramm,
Kangjie Zhou
Abstract:
The binary perceptron problem asks us to find a sign vector in the intersection of independently chosen random halfspaces with intercept $-κ$. We analyze the performance of the canonical discrepancy minimization algorithms of Lovett-Meka and Rothvoss/Eldan-Singh for the asymmetric binary perceptron problem. We obtain new algorithmic results in the $κ= 0$ case and in the large-$|κ|$ case. In the…
▽ More
The binary perceptron problem asks us to find a sign vector in the intersection of independently chosen random halfspaces with intercept $-κ$. We analyze the performance of the canonical discrepancy minimization algorithms of Lovett-Meka and Rothvoss/Eldan-Singh for the asymmetric binary perceptron problem. We obtain new algorithmic results in the $κ= 0$ case and in the large-$|κ|$ case. In the $κ\to-\infty$ case, we additionally characterize the storage capacity and complement our algorithmic results with an almost-matching overlap-gap lower bound.
△ Less
Submitted 23 May, 2025; v1 submitted 18 July, 2024;
originally announced August 2024.
-
Semidefinite programs simulate approximate message passing robustly
Authors:
Misha Ivkov,
Tselil Schramm
Abstract:
Approximate message passing (AMP) is a family of iterative algorithms that generalize matrix power iteration. AMP algorithms are known to optimally solve many average-case optimization problems. In this paper, we show that a large class of AMP algorithms can be simulated in polynomial time by \emph{local statistics hierarchy} semidefinite programs (SDPs), even when an unknown principal minor of me…
▽ More
Approximate message passing (AMP) is a family of iterative algorithms that generalize matrix power iteration. AMP algorithms are known to optimally solve many average-case optimization problems. In this paper, we show that a large class of AMP algorithms can be simulated in polynomial time by \emph{local statistics hierarchy} semidefinite programs (SDPs), even when an unknown principal minor of measure $1/\mathrm{polylog}(\mathrm{dimension})$ is adversarially corrupted. Ours are the first robust guarantees for many of these problems. Further, our results offer an interesting counterpoint to strong lower bounds against less constrained SDP relaxations for average-case max-cut-gain (a.k.a. "optimizing the Sherrington-Kirkpatrick Hamiltonian") and other problems.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Spectral clustering in the Gaussian mixture block model
Authors:
Shuangping Li,
Tselil Schramm
Abstract:
Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge τ$ for a pre-specified…
▽ More
Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex $i$ with a latent feature vector $u_i \in \mathbb{R}^d$ sampled from a mixture of Gaussians, and we add edge $(i,j)$ if and only if the feature vectors are sufficiently similar, in that $\langle u_i,u_j \rangle \ge τ$ for a pre-specified threshold $τ$. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component).
In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors $d\to \infty$ as the size of the network $n \to \infty$. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.
△ Less
Submitted 10 April, 2024; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Local and global expansion in random geometric graphs
Authors:
Siqi Liu,
Sidhanth Mohanty,
Tselil Schramm,
Elizabeth Yang
Abstract:
Consider a random geometric 2-dimensional simplicial complex $X$ sampled as follows: first, sample $n$ vectors $\boldsymbol{u_1},\ldots,\boldsymbol{u_n}$ uniformly at random on $\mathbb{S}^{d-1}$; then, for each triple $i,j,k \in [n]$, add $\{i,j,k\}$ and all of its subsets to $X$ if and only if…
▽ More
Consider a random geometric 2-dimensional simplicial complex $X$ sampled as follows: first, sample $n$ vectors $\boldsymbol{u_1},\ldots,\boldsymbol{u_n}$ uniformly at random on $\mathbb{S}^{d-1}$; then, for each triple $i,j,k \in [n]$, add $\{i,j,k\}$ and all of its subsets to $X$ if and only if $\langle{\boldsymbol{u_i},\boldsymbol{u_j}}\rangle \ge τ, \langle{\boldsymbol{u_i},\boldsymbol{u_k}}\rangle \ge τ$, and $\langle \boldsymbol{u_j}, \boldsymbol{u_k}\rangle \ge τ$. We prove that for every $\varepsilon > 0$, there exists a choice of $d = Θ(\log n)$ and $τ= τ(\varepsilon,d)$ so that with high probability, $X$ is a high-dimensional expander of average degree $n^\varepsilon$ in which each $1$-link has spectral gap bounded away from $\frac{1}{2}$.
To our knowledge, this is the first demonstration of a natural distribution over $2$-dimensional expanders of arbitrarily small polynomial average degree and spectral link expansion better than $\frac{1}{2}$. All previously known constructions are algebraic. This distribution also furnishes an example of simplicial complexes for which the trickle-down theorem is nearly tight.
En route, we prove general bounds on the spectral expansion of random induced subgraphs of arbitrary vertex transitive graphs, which may be of independent interest. For example, one consequence is an almost-sharp bound on the second eigenvalue of random $n$-vertex geometric graphs on $\mathbb{S}^{d-1}$, which was previously unknown for most $n,d$ pairs.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics
Authors:
Afonso S. Bandeira,
Ahmed El Alaoui,
Samuel B. Hopkins,
Tselil Schramm,
Alexander S. Wein,
Ilias Zadik
Abstract:
Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to m…
▽ More
Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to make a rigorous connection between the seemingly different low-degree and free-energy based approaches. We define a free-energy based criterion for hardness and formally connect it to the well-established notion of low-degree hardness for a broad class of statistical problems, namely all Gaussian additive models and certain models with a sparse planted signal. By leveraging these rigorous connections we are able to: establish that for Gaussian additive models the "algebraic" notion of low-degree hardness implies failure of "geometric" local MCMC algorithms, and provide new low-degree lower bounds for sparse linear regression which seem difficult to prove directly. These results provide both conceptual insights into the connections between different notions of hardness, as well as concrete technical tools such as new methods for proving low-degree lower bounds.
△ Less
Submitted 13 October, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
A Robust Spectral Algorithm for Overcomplete Tensor Decomposition
Authors:
Samuel B. Hopkins,
Tselil Schramm,
Jonathan Shi
Abstract:
We give a spectral algorithm for decomposing overcomplete order-4 tensors, so long as their components satisfy an algebraic non-degeneracy condition that holds for nearly all (all but an algebraic set of measure $0$) tensors over $(\mathbb{R}^d)^{\otimes 4}$ with rank $n \le d^2$. Our algorithm is robust to adversarial perturbations of bounded spectral norm.
Our algorithm is inspired by one whic…
▽ More
We give a spectral algorithm for decomposing overcomplete order-4 tensors, so long as their components satisfy an algebraic non-degeneracy condition that holds for nearly all (all but an algebraic set of measure $0$) tensors over $(\mathbb{R}^d)^{\otimes 4}$ with rank $n \le d^2$. Our algorithm is robust to adversarial perturbations of bounded spectral norm.
Our algorithm is inspired by one which uses the sum-of-squares semidefinite programming hierarchy (Ma, Shi, and Steurer STOC'16, arXiv:1610.01980), and we achieve comparable robustness and overcompleteness guarantees under similar algebraic assumptions. However, our algorithm avoids semidefinite programming and may be implemented as a series of basic linear-algebraic operations. We consequently obtain a much faster running time than semidefinite programming methods: our algorithm runs in time $\tilde O(n^2d^3) \le \tilde O(d^7)$, which is subquadratic in the input size $d^4$ (where we have suppressed factors related to the condition number of the input tensor).
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
Testing thresholds for high-dimensional sparse random geometric graphs
Authors:
Siqi Liu,
Sidhanth Mohanty,
Tselil Schramm,
Elizabeth Yang
Abstract:
In the random geometric graph model $\mathsf{Geo}_d(n,p)$, we identify each of our $n$ vertices with an independently and uniformly sampled vector from the $d$-dimensional unit sphere, and we connect pairs of vertices whose vectors are ``sufficiently close'', such that the marginal probability of an edge is $p$.
We investigate the problem of testing for this latent geometry, or in other words, d…
▽ More
In the random geometric graph model $\mathsf{Geo}_d(n,p)$, we identify each of our $n$ vertices with an independently and uniformly sampled vector from the $d$-dimensional unit sphere, and we connect pairs of vertices whose vectors are ``sufficiently close'', such that the marginal probability of an edge is $p$.
We investigate the problem of testing for this latent geometry, or in other words, distinguishing an Erdős-Rényi graph $\mathsf{G}(n, p)$ from a random geometric graph $\mathsf{Geo}_d(n, p)$. It is not too difficult to show that if $d\to \infty$ while $n$ is held fixed, the two distributions become indistinguishable; we wish to understand how fast $d$ must grow as a function of $n$ for indistinguishability to occur.
When $p = \fracα{n}$ for constant $α$, we prove that if $d \ge \mathrm{polylog} n$, the total variation distance between the two distributions is close to $0$; this improves upon the best previous bound of Brennan, Bresler, and Nagaraj (2020), which required $d \gg n^{3/2}$, and further our result is nearly tight, resolving a conjecture of Bubeck, Ding, Eldan, \& Rácz (2016) up to logarithmic factors. We also obtain improved upper bounds on the statistical indistinguishability thresholds in $d$ for the full range of $p$ satisfying $\frac{1}{n}\le p\le \frac{1}{2}$, improving upon the previous bounds by polynomial factors.
Our analysis uses the Belief Propagation algorithm to characterize the distributions of (subsets of) the random vectors {\em conditioned on producing a particular graph}. In this sense, our analysis is connected to the ``cavity method'' from statistical physics. To analyze this process, we rely on novel sharp estimates for the area of the intersection of a random sphere cap with an arbitrary subset of the sphere, which we prove using optimal transport maps and entropy-transport inequalities on the unit sphere.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
The SDP value of random 2CSPs
Authors:
Amulya Musipatla,
Ryan O'Donnell,
Tselil Schramm,
Xinyu Wu
Abstract:
We consider a very wide class of models for sparse random Boolean 2CSPs; equivalently, degree-2 optimization problems over~$\{\pm 1\}^n$. For each model $\mathcal{M}$, we identify the "high-probability value"~$s^*_{\mathcal{M}}$ of the natural SDP relaxation (equivalently, the quantum value). That is, for all $\varepsilon > 0$ we show that the SDP optimum of a random $n$-variable instance is (when…
▽ More
We consider a very wide class of models for sparse random Boolean 2CSPs; equivalently, degree-2 optimization problems over~$\{\pm 1\}^n$. For each model $\mathcal{M}$, we identify the "high-probability value"~$s^*_{\mathcal{M}}$ of the natural SDP relaxation (equivalently, the quantum value). That is, for all $\varepsilon > 0$ we show that the SDP optimum of a random $n$-variable instance is (when normalized by~$n$) in the range $(s^*_{\mathcal{M}}-\varepsilon, s^*_{\mathcal{M}}+\varepsilon)$ with high probability. Our class of models includes non-regular CSPs, and ones where the SDP relaxation value is strictly smaller than the spectral relaxation value.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Robust Regression Revisited: Acceleration and Improved Estimation Rates
Authors:
Arun Jambulapati,
Jerry Li,
Tselil Schramm,
Kevin Tian
Abstract:
We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of…
▽ More
We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlier-removal method calling a stationary point finder.
We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature.
For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions.
△ Less
Submitted 22 June, 2021;
originally announced June 2021.
-
Non-asymptotic approximations of neural networks by Gaussian processes
Authors:
Ronen Eldan,
Dan Mikulincer,
Tselil Schramm
Abstract:
We study the extent to which wide neural networks may be approximated by Gaussian processes when initialized with random weights. It is a well-established fact that as the width of a network goes to infinity, its law converges to that of a Gaussian process. We make this quantitative by establishing explicit convergence rates for the central limit theorem in an infinite-dimensional functional space…
▽ More
We study the extent to which wide neural networks may be approximated by Gaussian processes when initialized with random weights. It is a well-established fact that as the width of a network goes to infinity, its law converges to that of a Gaussian process. We make this quantitative by establishing explicit convergence rates for the central limit theorem in an infinite-dimensional functional space, metrized with a natural transportation distance. We identify two regimes of interest; when the activation function is polynomial, its degree determines the rate of convergence, while for non-polynomial activations, the rate is governed by the smoothness of the function.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent
Authors:
Matthew Brennan,
Guy Bresler,
Samuel B. Hopkins,
Jerry Li,
Tselil Schramm
Abstract:
Researchers currently use a number of approaches to predict and substantiate information-computation gaps in high-dimensional statistical estimation problems. A prominent approach is to characterize the limits of restricted models of computation, which on the one hand yields strong computational lower bounds for powerful classes of algorithms and on the other hand helps guide the development of ef…
▽ More
Researchers currently use a number of approaches to predict and substantiate information-computation gaps in high-dimensional statistical estimation problems. A prominent approach is to characterize the limits of restricted models of computation, which on the one hand yields strong computational lower bounds for powerful classes of algorithms and on the other hand helps guide the development of efficient algorithms. In this paper, we study two of the most popular restricted computational models, the statistical query framework and low-degree polynomials, in the context of high-dimensional hypothesis testing. Our main result is that under mild conditions on the testing problem, the two classes of algorithms are essentially equivalent in power. As corollaries, we obtain new statistical query lower bounds for sparse PCA, tensor PCA and several variants of the planted clique problem.
△ Less
Submitted 26 June, 2021; v1 submitted 13 September, 2020;
originally announced September 2020.
-
Computational Barriers to Estimation from Low-Degree Polynomials
Authors:
Tselil Schramm,
Alexander S. Wein
Abstract:
One fundamental goal of high-dimensional statistics is to detect or recover planted structure (such as a low-rank matrix) hidden in noisy data. A growing body of work studies low-degree polynomials as a restricted model of computation for such problems: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polyn…
▽ More
One fundamental goal of high-dimensional statistics is to detect or recover planted structure (such as a low-rank matrix) hidden in noisy data. A growing body of work studies low-degree polynomials as a restricted model of computation for such problems: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms. Prior work has studied the power of low-degree polynomials for the task of detecting the presence of hidden structures. In this work, we extend these methods to address problems of estimation and recovery (instead of detection). For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree-D polynomial. To our knowledge, these are the first results to establish low-degree hardness of recovery problems for which the associated detection problem is easy. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems, resolving (in the low-degree framework) open problems about the computational complexity of recovery in both cases.
△ Less
Submitted 18 June, 2022; v1 submitted 5 August, 2020;
originally announced August 2020.
-
The threshold for SDP-refutation of random regular NAE-3SAT
Authors:
Yash Deshpande,
Andrea Montanari,
Ryan O'Donnell,
Tselil Schramm,
Subhabrata Sen
Abstract:
Unlike its cousin 3SAT, the NAE-3SAT (not-all-equal-3SAT) problem has the property that spectral/SDP algorithms can efficiently refute random instances when the constraint density is a large constant (with high probability). But do these methods work immediately above the "satisfiability threshold", or is there still a range of constraint densities for which random NAE-3SAT instances are unsatisfi…
▽ More
Unlike its cousin 3SAT, the NAE-3SAT (not-all-equal-3SAT) problem has the property that spectral/SDP algorithms can efficiently refute random instances when the constraint density is a large constant (with high probability). But do these methods work immediately above the "satisfiability threshold", or is there still a range of constraint densities for which random NAE-3SAT instances are unsatisfiable but hard to refute?
We show that the latter situation prevails, at least in the context of random regular instances and SDP-based refutation. More precisely, whereas a random $d$-regular instance of NAE-3SAT is easily shown to be unsatisfiable (whp) once $d \geq 8$, we establish the following sharp threshold result regarding efficient refutation: If $d < 13.5$ then the basic SDP, even augmented with triangle inequalities, fails to refute satisfiability (whp), if $d > 13.5$ then even the most basic spectral algorithm refutes satisfiability~(whp).
△ Less
Submitted 14 April, 2018;
originally announced April 2018.
-
Braess's paradox for the spectral gap in random graphs and delocalization of eigenvectors
Authors:
Ronen Eldan,
Miklós Rácz,
Tselil Schramm
Abstract:
We study how the spectral gap of the normalized Laplacian of a random graph changes when an edge is added to or removed from the graph. There are known examples of graphs where, perhaps counterintuitively, adding an edge can decrease the spectral gap, a phenomenon that is analogous to Braess's paradox in traffic networks. We show that this is often the case in random graphs in a strong sense. More…
▽ More
We study how the spectral gap of the normalized Laplacian of a random graph changes when an edge is added to or removed from the graph. There are known examples of graphs where, perhaps counterintuitively, adding an edge can decrease the spectral gap, a phenomenon that is analogous to Braess's paradox in traffic networks. We show that this is often the case in random graphs in a strong sense. More precisely, we show that for typical instances of Erdős-Rényi random graphs $G(n,p)$ with constant edge density $p \in (0,1)$, the addition of a random edge will decrease the spectral gap with positive probability, strictly bounded away from zero. To do this, we prove a new delocalization result for eigenvectors of the Laplacian of $G(n,p)$, which might be of independent interest.
△ Less
Submitted 20 June, 2015; v1 submitted 28 April, 2015;
originally announced April 2015.
-
Global and Local Information in Clustering Labeled Block Models
Authors:
Varun Kanade,
Elchanan Mossel,
Tselil Schramm
Abstract:
The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and…
▽ More
The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly.
In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).
△ Less
Submitted 3 July, 2014; v1 submitted 25 April, 2014;
originally announced April 2014.