Search | arXiv e-print repository

Low-degree phase transitions for detecting a planted clique in sublinear time

Authors: Jay Mardia, Kabir Aladin Verchand, Alexander S. Wein

Abstract: We consider the problem of detecting a planted clique of size $k$ in a random graph on $n$ vertices. When the size of the clique exceeds $Θ(\sqrt{n})$, polynomial-time algorithms for detection proliferate. We study faster -- namely, sublinear time -- algorithms in the high-signal regime when $k = Θ(n^{1/2 + δ})$, for some $δ> 0$. To this end, we consider algorithms that non-adaptively query a subs… ▽ More We consider the problem of detecting a planted clique of size $k$ in a random graph on $n$ vertices. When the size of the clique exceeds $Θ(\sqrt{n})$, polynomial-time algorithms for detection proliferate. We study faster -- namely, sublinear time -- algorithms in the high-signal regime when $k = Θ(n^{1/2 + δ})$, for some $δ> 0$. To this end, we consider algorithms that non-adaptively query a subset $M$ of entries of the adjacency matrix and then compute a low-degree polynomial function of the revealed entries. We prove a computational phase transition for this class of non-adaptive low-degree algorithms: under the scaling $\lvert M \rvert = Θ(n^γ)$, the clique can be detected when $γ> 3(1/2 - δ)$ but not when $γ< 3(1/2 - δ)$. As a result, the best known runtime for detecting a planted clique, $\widetilde{O}(n^{3(1/2-δ)})$, cannot be improved without looking beyond the non-adaptive low-degree class. Our proof of the lower bound -- based on bounding the conditional low-degree likelihood ratio -- reveals further structure in non-adaptive detection of a planted clique. Using (a bound on) the conditional low-degree likelihood ratio as a potential function, we show that for every non-adaptive query pattern, there is a highly structured query pattern of the same size that is at least as effective. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 23 pages, 2 figures

arXiv:2107.11886 [pdf, ps, other]

Logspace Reducibility From Secret Leakage Planted Clique

Authors: Jay Mardia

Abstract: The planted clique problem is well-studied in the context of observing, explaining, and predicting interesting computational phenomena associated with statistical problems. When equating computational efficiency with the existence of polynomial time algorithms, the computational hardness of (some variant of) the planted clique problem can be used to infer the computational hardness of a host of ot… ▽ More The planted clique problem is well-studied in the context of observing, explaining, and predicting interesting computational phenomena associated with statistical problems. When equating computational efficiency with the existence of polynomial time algorithms, the computational hardness of (some variant of) the planted clique problem can be used to infer the computational hardness of a host of other statistical problems. Is this ability to transfer computational hardness from (some variant of) the planted clique problem to other statistical problems robust to changing our notion of computational efficiency to space efficiency? We answer this question affirmatively for three different statistical problems, namely Sparse PCA, submatrix detection, and testing almost k-wise independence. The key challenge is that space efficient randomized reductions need to repeatedly access the randomness they use. Known reductions to these problems are all randomized and need polynomially many random bits to implement. Since we can not store polynomially many random bits in memory, it is unclear how to implement these existing reductions space efficiently. There are two ideas involved in circumventing this issue and implementing known reductions to these problems space efficiently. 1. When solving statistical problems, we can use parts of the input itself as randomness. 2. Secret leakage variants of the planted clique problem with appropriate secret leakage can be more useful than the standard planted clique problem when we want to use parts of the input as randomness. (abstract shortened due to arxiv constraints) △ Less

Submitted 8 November, 2023; v1 submitted 25 July, 2021; originally announced July 2021.

arXiv:2008.12825 [pdf, ps, other]

Is the space complexity of planted clique recovery the same as that of detection?

Authors: Jay Mardia

Abstract: We study the planted clique problem in which a clique of size k is planted in an Erdős-Rényi graph G(n, 1/2), and one is interested in either detecting or recovering this planted clique. This problem is interesting because it is widely believed to show a statistical-computational gap at clique size k=sqrt{n}, and has emerged as the prototypical problem with such a gap from which average-case hardn… ▽ More We study the planted clique problem in which a clique of size k is planted in an Erdős-Rényi graph G(n, 1/2), and one is interested in either detecting or recovering this planted clique. This problem is interesting because it is widely believed to show a statistical-computational gap at clique size k=sqrt{n}, and has emerged as the prototypical problem with such a gap from which average-case hardness of other statistical problems can be deduced. It also displays a tight computational connection between the detection and recovery variants, unlike other problems of a similar nature. This wide investigation into the computational complexity of the planted clique problem has, however, mostly focused on its time complexity. In this work, we ask- Do the statistical-computational phenomena that make the planted clique an interesting problem also hold when we use `space efficiency' as our notion of computational efficiency? It is relatively easy to show that a positive answer to this question depends on the existence of a O(log n) space algorithm that can recover planted cliques of size k = Omega(sqrt{n}). Our main result comes very close to designing such an algorithm. We show that for k=Omega(sqrt{n}), the recovery problem can be solved in O((log*{n}-log*{k/sqrt{n}}) log n) bits of space. 1. If k = omega(sqrt{n}log^{(l)}n) for any constant integer l > 0, the space usage is O(log n) bits. 2.If k = Theta(sqrt{n}), the space usage is O(log*{n} log n) bits. Our result suggests that there does exist an O(log n) space algorithm to recover cliques of size k = Omega(sqrt{n}), since we come very close to achieving such parameters. This provides evidence that the statistical-computational phenomena that (conjecturally) hold for planted clique time complexity also (conjecturally) hold for space complexity. △ Less

Submitted 23 November, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

arXiv:2004.12002 [pdf, other]

Finding Planted Cliques in Sublinear Time

Authors: Jay Mardia, Hilal Asi, Kabir Aladin Chandrasekher

Abstract: We study the planted clique problem in which a clique of size k is planted in an Erdos-Renyi graph G(n,1/2) and one is interested in recovering this planted clique. It is widely believed that it exhibits a statistical-computational gap when computational efficiency is equated with the existence of polynomial time algorithms. We study this problem under a more fine-grained computational lens and co… ▽ More We study the planted clique problem in which a clique of size k is planted in an Erdos-Renyi graph G(n,1/2) and one is interested in recovering this planted clique. It is widely believed that it exhibits a statistical-computational gap when computational efficiency is equated with the existence of polynomial time algorithms. We study this problem under a more fine-grained computational lens and consider the following two questions. 1. Do there exist sublinear time algorithms for recovering the planted clique? 2. What is the smallest running time any algorithm can hope to have? We show that because of a well known clique-completion property, very elementary sublinear time recovery algorithms do indeed exist for clique sizes k = ω(\sqrt{n}). This points to a qualitatively stronger statistical-computational gap. The planted clique recovery problem can be solved without even looking at most of the input above the Θ(\sqrt{n}) threshold and cannot be solved by any efficient algorithm below it. A running time lower bound for the recovery problem follows easily from the results of [RS19], and this implies our recovery algorithms are optimal whenever k = Ω(n^{2/3}). However, for k = o(n^{2/3}) there is a gap between our algorithmic upper bound and the information-theoretic lower bound implied by [RS19]. With some caveats, we show stronger detection lower bounds based on the Planted Clique Conjecture for a natural but restricted class of algorithms. The key idea is to relate very fast sublinear time algorithms for detecting large planted cliques to polynomial time algorithms for detecting small planted cliques. △ Less

Submitted 17 October, 2022; v1 submitted 24 April, 2020; originally announced April 2020.

arXiv:1809.06522 [pdf, other]

Concentration Inequalities for the Empirical Distribution

Authors: Jay Mardia, Jiantao Jiao, Ervin Tánczos, Robert D. Nowak, Tsachy Weissman

Abstract: We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining… ▽ More We study concentration inequalities for the Kullback--Leibler (KL) divergence between the empirical distribution and the true distribution. Applying a recursion technique, we improve over the method of types bound uniformly in all regimes of sample size $n$ and alphabet size $k$, and the improvement becomes more significant when $k$ is large. We discuss the applications of our results in obtaining tighter concentration inequalities for $L_1$ deviations of the empirical distribution from the true distribution, and the difference between concentration around the expectation or zero. We also obtain asymptotically tight bounds on the variance of the KL divergence between the empirical and true distribution, and demonstrate their quantitatively different behaviors between small and large sample sizes compared to the alphabet size. △ Less

Submitted 18 October, 2019; v1 submitted 18 September, 2018; originally announced September 2018.

Comments: Accepted for publication in Information and Inference

arXiv:1707.02241 [pdf, other]

Repairing Multiple Failures for Scalar MDS Codes

Authors: Jay Mardia, Burak Bartan, Mary Wootters

Abstract: In distributed storage, erasure codes -- like Reed-Solomon Codes -- are often employed to provide reliability. In this setting, it is desirable to be able to repair one or more failed nodes while minimizing the repair bandwidth. In this work, motivated by Reed-Solomon codes, we study the problem of repairing multiple failed nodes in a scalar MDS code. We extend the framework of (Guruswami and Woot… ▽ More In distributed storage, erasure codes -- like Reed-Solomon Codes -- are often employed to provide reliability. In this setting, it is desirable to be able to repair one or more failed nodes while minimizing the repair bandwidth. In this work, motivated by Reed-Solomon codes, we study the problem of repairing multiple failed nodes in a scalar MDS code. We extend the framework of (Guruswami and Wootters, 2017) to give a framework for constructing repair schemes for multiple failures in general scalar MDS codes, in the centralized repair model. We then specialize our framework to Reed-Solomon codes, and extend and improve upon recent results of (Dau et al., 2017). △ Less

Submitted 19 April, 2018; v1 submitted 7 July, 2017; originally announced July 2017.

Showing 1–6 of 6 results for author: Mardia, J