-
DNF Learning via Locally Mixing Random Walks
Authors:
Josh Alman,
Shivam Nadimpalli,
Shyamal Patel,
Rocco A. Servedio
Abstract:
We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$.
(1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF…
▽ More
We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$.
(1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF formula $f = T_1 \vee \cdots \vee T_s$ over $\{0,1\}^n$ and any unknown distribution $D$ over $\{0,1\}^n$, our algorithm, which uses membership queries and random examples from $D$, runs in $\textsf{quasipoly}(n,s)$ time and outputs a list $L$ of candidate terms such that with high probability some term $T_i$ of $f$ belongs to $L$.
(2) We then use result (1) to give a $\textsf{quasipoly}(n,s)$-time algorithm, in the distribution-free PAC learning model with membership queries, for learning the class of size-$s$ DNFs in which all terms have the same size. Our algorithm learns using a DNF hypothesis.
The key tool used to establish result (1) is a new result on "locally mixing random walks," which, roughly speaking, shows that a random walk on a graph that is covered by a small number of expanders has a non-negligible probability of mixing quickly in a subset of these expanders.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
A Mysterious Connection Between Tolerant Junta Testing and Agnostically Learning Conjunctions
Authors:
Xi Chen,
Shyamal Patel,
Rocco A. Servedio
Abstract:
The main conceptual contribution of this paper is identifying a previously unnoticed connection between two central problems in computational learning theory and property testing: agnostically learning conjunctions and tolerantly testing juntas. Inspired by this connection, the main technical contribution is a pair of improved algorithms for these two problems.
In more detail,
- We give a dist…
▽ More
The main conceptual contribution of this paper is identifying a previously unnoticed connection between two central problems in computational learning theory and property testing: agnostically learning conjunctions and tolerantly testing juntas. Inspired by this connection, the main technical contribution is a pair of improved algorithms for these two problems.
In more detail,
- We give a distribution-free algorithm for agnostically PAC learning conjunctions over $\{\pm 1\}^n$ that runs in time $2^{\widetilde{O}(n^{1/3})}$, for constant excess error $\varepsilon$. This improves on the fastest previously published algorithm, which runs in time $2^{\widetilde{O}(n^{1/2})}$ [KKMS08].
- Building on the ideas in our agnostic conjunction learner and using significant additional technical ingredients, we give an adaptive tolerant testing algorithm for $k$-juntas that makes $2^{\widetilde{O}(k^{1/3})}$ queries, for constant "gap parameter" $\varepsilon$ between the "near" and "far" cases. This improves on the best previous results, due to [ITW21, NP24], which make $2^{\widetilde{O}(\sqrt{k})}$ queries. Since there is a known $2^{\widetildeΩ(\sqrt{k})}$ lower bound for non-adaptive tolerant junta testers, our result shows that adaptive tolerant junta testing algorithms provably outperform non-adaptive ones.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Testing Juntas and Junta Subclasses with Relative Error
Authors:
Xi Chen,
William Pires,
Toniann Pitassi,
Rocco A. Servedio
Abstract:
This papers considers the junta testing problem in a recently introduced ``relative error'' variant of the standard Boolean function property testing model. In relative-error testing we measure the distance from $f$ to $g$, where $f,g: \{0,1\}^n \to \{0,1\}$, by the ratio of $|f^{-1}(1) \triangle g^{-1}(1)|$ (the number of inputs on which $f$ and $g$ disagree) to $|f^{-1}(1)|$ (the number of satis…
▽ More
This papers considers the junta testing problem in a recently introduced ``relative error'' variant of the standard Boolean function property testing model. In relative-error testing we measure the distance from $f$ to $g$, where $f,g: \{0,1\}^n \to \{0,1\}$, by the ratio of $|f^{-1}(1) \triangle g^{-1}(1)|$ (the number of inputs on which $f$ and $g$ disagree) to $|f^{-1}(1)|$ (the number of satisfying assignments of $f$), and we give the testing algorithm both black-box access to $f$ and also access to independent uniform samples from $f^{-1}(1)$.
Chen et al. (SODA 2025) observed that the class of $k$-juntas is $\text{poly}(2^k,1/ε)$-query testable in the relative-error model, and asked whether $\text{poly}(k,1/ε)$ queries is achievable. We answer this question affirmatively by giving a $\tilde{O}(k/ε)$-query algorithm, matching the optimal complexity achieved in the less challenging standard model. Moreover, as our main result, we show that any subclass of $k$-juntas that is closed under permuting variables is relative-error testable with a similar complexity. This gives highly efficient relative-error testing algorithms for a number of well-studied function classes, including size-$k$ decision trees, size-$k$ branching programs, and size-$k$ Boolean formulas.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Relative-error testing of conjunctions and decision lists
Authors:
Xi Chen,
William Pires,
Toniann Pitassi,
Rocco A. Servedio
Abstract:
We study the relative-error property testing model for Boolean functions that was recently introduced in the work of Chen et al. (SODA 2025). In relative-error testing, the testing algorithm gets uniform random satisfying assignments as well as black-box queries to $f$, and it must accept $f$ with high probability whenever $f$ has the property that is being tested and reject any $f$ that is relati…
▽ More
We study the relative-error property testing model for Boolean functions that was recently introduced in the work of Chen et al. (SODA 2025). In relative-error testing, the testing algorithm gets uniform random satisfying assignments as well as black-box queries to $f$, and it must accept $f$ with high probability whenever $f$ has the property that is being tested and reject any $f$ that is relative-error far from having the property. Here the relative-error distance from $f$ to a function $g$ is measured with respect to $|f^{-1}(1)|$ rather than with respect to the entire domain size $2^n$ as in the Hamming distance measure that is used in the standard model; thus, unlike the standard model, relative-error testing allows us to study the testability of sparse Boolean functions that have few satisfying assignments. It was shown in Chen et al. (SODA 2025) that relative-error testing is at least as difficult as standard-model property testing, but for many natural and important Boolean function classes the precise relationship between the two notions is unknown.
In this paper we consider the well-studied and fundamental properties of being a conjunction and being a decision list. In the relative-error setting, we give an efficient one-sided error tester for conjunctions with running time and query complexity $O(1/ε)$.
Secondly, we give a two-sided relative-error $\tilde{O}$$(1/ε)$ tester for decision lists, matching the query complexity of the state-of-the-art algorithm in the standard model Bshouty (RANDOM 2020) and Diakonikolas et al. (FOCS 2007).
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Lower Bounds for Leader Election and Collective Coin Flipping, Revisited
Authors:
Eshan Chattopadhyay,
Mohit Gurumukhani,
Noam Ringach,
Rocco Servedio
Abstract:
We study the tasks of collective coin flipping and leader election in the full-information model.
We prove new lower bounds for coin flipping protocols, implying lower bounds for leader election protocols. We show that any $k$-round coin flipping protocol, where each of $\ell$ players sends 1 bit per round, can be biased by $O(\ell/\log^{(k)}(\ell))$ bad players. For all $k>1$ this strengthens p…
▽ More
We study the tasks of collective coin flipping and leader election in the full-information model.
We prove new lower bounds for coin flipping protocols, implying lower bounds for leader election protocols. We show that any $k$-round coin flipping protocol, where each of $\ell$ players sends 1 bit per round, can be biased by $O(\ell/\log^{(k)}(\ell))$ bad players. For all $k>1$ this strengthens previous lower bounds [RSZ, SICOMP 2002], which ruled out protocols resilient to adversaries controlling $O(\ell/\log^{(2k-1)}(\ell))$ players. Consequently, we establish that any protocol tolerating a linear fraction of corrupt players, with only 1 bit per round, must run for at least $\log^*\ell-O(1)$ rounds, improving on the prior best lower bound of $\frac12 \log^*\ell-\log^*\log^*\ell$. This lower bound matches the number of rounds, $\log^*\ell$, taken by the current best coin flipping protocols from [RZ, JCSS 2001], [F, FOCS 1999] that can handle a linear sized coalition of bad players, but with players sending unlimited bits per round. We also derive lower bounds for protocols allowing multi-bit messages per round. Our results show that the protocols from [RZ, JCSS 2001], [F, FOCS 1999] that handle a linear number of corrupt players are almost optimal in terms of round complexity and communication per player in a round.
A key technical ingredient in proving our lower bounds is a new result regarding biasing most functions from a family of functions using a common set of bad players and a small specialized set of bad players specific to each function that is biased.
We give improved constant-round coin flipping protocols in the setting that each player can send 1 bit per round. For two rounds, our protocol can handle $O(\ell/(\log\ell)(\log\log\ell)^2)$ sized coalition of bad players; better than the best one-round protocol by [AL, Combinatorica 1993] in this setting.
△ Less
Submitted 2 April, 2025;
originally announced April 2025.
-
Sparsifying Suprema of Gaussian Processes
Authors:
Anindya De,
Shivam Nadimpalli,
Ryan O'Donnell,
Rocco A. Servedio
Abstract:
We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\boldsymbol{X}_t\}_{t\in T}$ be the canonical Gaussian process on $T$. We show that there is an $O_\varepsilon(1)$-size subset $S \subseteq T$ and a set of real values $\{c_s\}_{s \in S}$ such that…
▽ More
We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\boldsymbol{X}_t\}_{t\in T}$ be the canonical Gaussian process on $T$. We show that there is an $O_\varepsilon(1)$-size subset $S \subseteq T$ and a set of real values $\{c_s\}_{s \in S}$ such that $\sup_{s \in S} \{\boldsymbol{X}_s + c_s\}$ is an $\varepsilon$-approximator of $\sup_{t \in T} {\boldsymbol{X}}_t$. Notably, the size of $S$ is completely independent of both the size of $T$ and of the ambient dimension $n$.
We use this to show that every norm is essentially a junta when viewed as a function over Gaussian space: Given any norm $ν(x)$ on $\mathbb{R}^n$, there is another norm $ψ(x)$ which depends only on the projection of $x$ along $O_\varepsilon(1)$ directions, for which $ψ({\boldsymbol{g}})$ is a multiplicative $(1 \pm \varepsilon)$-approximation of $ν({\boldsymbol{g}})$ with probability $1-\varepsilon$ for ${\boldsymbol{g}} \sim N(0,I_n)$.
We also use our sparsification result for suprema of centered Gaussian processes to give a sparsification lemma for convex sets of bounded geometric width: Any intersection of (possibly infinitely many) halfspaces in $\mathbb{R}^n$ that are at distance $O(1)$ from the origin is $\varepsilon$-close, under $N(0,I_n)$, to an intersection of only $O_\varepsilon(1)$ many halfspaces.
We describe applications to agnostic learning and tolerant property testing.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Condensing and Extracting Against Online Adversaries
Authors:
Eshan Chattopadhyay,
Mohit Gurumukhani,
Noam Ringach,
Rocco Servedio
Abstract:
We investigate the tasks of deterministically condensing and extracting randomness from Online Non-Oblivious Symbol Fixing (oNOSF) sources, a natural model of defective random sources for which extraction is impossible in many parameter regimes [AORSV, EUROCRYPT'20]. A $(g,\ell)$-oNOSF source is a sequence of $\ell$ blocks where $g$ of the blocks are good (are independent and have some min-entropy…
▽ More
We investigate the tasks of deterministically condensing and extracting randomness from Online Non-Oblivious Symbol Fixing (oNOSF) sources, a natural model of defective random sources for which extraction is impossible in many parameter regimes [AORSV, EUROCRYPT'20]. A $(g,\ell)$-oNOSF source is a sequence of $\ell$ blocks where $g$ of the blocks are good (are independent and have some min-entropy), and the remaining bad blocks are controlled by an online adversary - can be arbitrarily correlated with any block that appears before it.
The existence of condensers for oNOSF sources was recently studied in [CGR, FOCS'24]. They proved various condensing impossibility results, and showed the existence of condensers when $n\gg\ell$.
We make significant progress on proving the existence of condensers in almost all parameter regimes, even when $n$ is a large constant and $\ell$ is growing. We next construct the first explicit condensers for oNOSF sources, matching the existential results of [CGR, FOCS'24]. We also obtain a much improved construction for transforming low-entropy oNOSF sources into uniform oNOSF sources.
We find interesting applications of our results to collective coin flipping and collective sampling, problems that are well-studied in fault-tolerant distributed computing. We use our condensers to provide very simple protocols for these problems.
Next, we turn to understanding the possibility of extraction from oNOSF sources. We initiate the study of a new, natural notion of the influence of functions, which we call online influence. We establish tight bounds on the online influence of functions, which imply extraction lower bounds. Lastly, we give explicit extractor constructions for oNOSF sources, using novel connections to leader election protocols. These extractors achieve parameters that go beyond standard resilient functions [AL, Combinatorica'93].
△ Less
Submitted 2 April, 2025; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Lower Bounds for Convexity Testing
Authors:
Xi Chen,
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio,
Erik Waingarten
Abstract:
We consider the problem of testing whether an unknown and arbitrary set $S \subseteq \mathbb{R}^n$ (given as a black-box membership oracle) is convex, versus $\varepsilon$-far from every convex set, under the standard Gaussian distribution. The current state-of-the-art testing algorithms for this problem make $2^{\tilde{O}(\sqrt{n})\cdot \mathrm{poly}(1/\varepsilon)}$ non-adaptive queries, both fo…
▽ More
We consider the problem of testing whether an unknown and arbitrary set $S \subseteq \mathbb{R}^n$ (given as a black-box membership oracle) is convex, versus $\varepsilon$-far from every convex set, under the standard Gaussian distribution. The current state-of-the-art testing algorithms for this problem make $2^{\tilde{O}(\sqrt{n})\cdot \mathrm{poly}(1/\varepsilon)}$ non-adaptive queries, both for the standard testing problem and for tolerant testing.
We give the first lower bounds for convexity testing in the black-box query model:
- We show that any one-sided tester (which may be adaptive) must use at least $n^{Ω(1)}$ queries in order to test to some constant accuracy $\varepsilon>0$.
- We show that any non-adaptive tolerant tester (which may make two-sided errors) must use at least $2^{Ω(n^{1/4})}$ queries to distinguish sets that are $\varepsilon_1$-close to convex versus $\varepsilon_2$-far from convex, for some absolute constants $0<\varepsilon_1<\varepsilon_2$.
Finally, we also show that for any constant $c>0$, any non-adaptive tester (which may make two-sided errors) must use at least $n^{1/4 - c}$ queries in order to test to some constant accuracy $\varepsilon>0$.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Relative-error monotonicity testing
Authors:
Xi Chen,
Anindya De,
Yizhi Huang,
Yuhao Li,
Shivam Nadimpalli,
Rocco A. Servedio,
Tianqi Yang
Abstract:
The standard model of Boolean function property testing is not well suited for testing $\textit{sparse}$ functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called $\textit{relative-error testing}$, which p…
▽ More
The standard model of Boolean function property testing is not well suited for testing $\textit{sparse}$ functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called $\textit{relative-error testing}$, which provides a natural framework for testing sparse functions.
This new model defines the distance between two functions $f, g: \{0,1\}^n \to \{0,1\}$ to be $$\textsf{reldist}(f,g) := { \frac{|f^{-1}(1) \triangle g^{-1}(1)|} {|f^{-1}(1)|}}.$$ This is a more demanding distance measure than the usual Hamming distance ${ {|f^{-1}(1) \triangle g^{-1}(1)|}/{2^n}}$ when $|f^{-1}(1)| \ll 2^n$; to compensate for this, algorithms in the new model have access both to a black-box oracle for the function $f$ being tested and to a source of independent uniform satisfying assignments of $f$.
In this paper we first give a few general results about the relative-error testing model; then, as our main technical contribution, we give a detailed study of algorithms and lower bounds for relative-error testing of $\textit{monotone}$ Boolean functions. We give upper and lower bounds which are parameterized by $N=|f^{-1}(1)|$, the sparsity of the function $f$ being tested. Our results show that there are interesting differences between relative-error monotonicity testing of sparse Boolean functions, and monotonicity testing in the standard model. These results motivate further study of the testability of Boolean function properties in the relative-error model.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Trace reconstruction from local statistical queries
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio
Abstract:
The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual…
▽ More
The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual traces themselves. Such an algorithm is said to be $\ell$-local if each of its statistical queries corresponds to an $\ell$-junta function over some block of $\ell$ consecutive bits in the trace. Since several -- but not all -- known algorithms for trace reconstruction fall under the local statistical query paradigm, it is interesting to understand the abilities and limitations of local SQ algorithms for trace reconstruction.
In this paper we establish nearly-matching upper and lower bounds on local Statistical Query algorithms for both worst-case and average-case trace reconstruction. For the worst-case problem, we show that there is an $\tilde{O}(n^{1/5})$-local SQ algorithm that makes all its queries with tolerance $τ\geq 2^{-\tilde{O}(n^{1/5})}$, and also that any $\tilde{O}(n^{1/5})$-local SQ algorithm must make some query with tolerance $τ\leq 2^{-\tildeΩ(n^{1/5})}$. For the average-case problem, we show that there is an $O(\log n)$-local SQ algorithm that makes all its queries with tolerance $τ\geq 1/\mathrm{poly}(n)$, and also that any $O(\log n)$-local SQ algorithm must make some query with tolerance $τ\leq 1/\mathrm{poly}(n).$
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Detecting Low-Degree Truncation
Authors:
Anindya De,
Huan Li,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncatio…
▽ More
We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \mathbb{R}^n$.
We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d. product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\mathbb{R}^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound:
$\bullet$ For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$);
$\bullet$ Even for the simplest case of ${\cal D}$ being the uniform distribution over $\{+1, -1\}^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $Ω(n^{d/2})$ samples.
△ Less
Submitted 21 November, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Testing Sumsets is Hard
Authors:
Xi Chen,
Shivam Nadimpalli,
Tim Randolph,
Rocco A. Servedio,
Or Zamir
Abstract:
A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator fu…
▽ More
A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator function of a sumset. Our lower bound for testing sumsets follows from sharp bounds on the related problem of shift testing, which may be of independent interest. We also give a near-optimal $2^{n/2} \cdot \mathrm{poly}(n)$-query algorithm for a smoothed analysis formulation of the sumset refutation problem.
△ Less
Submitted 4 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Testing Intersecting and Union-Closed Families
Authors:
Xi Chen,
Anindya De,
Yuhao Li,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
Inspired by the classic problem of Boolean function monotonicity testing, we investigate the testability of other well-studied properties of combinatorial finite set systems, specifically \emph{intersecting} families and \emph{union-closed} families. A function $f: \{0,1\}^n \to \{0,1\}$ is intersecting (respectively, union-closed) if its set of satisfying assignments corresponds to an intersectin…
▽ More
Inspired by the classic problem of Boolean function monotonicity testing, we investigate the testability of other well-studied properties of combinatorial finite set systems, specifically \emph{intersecting} families and \emph{union-closed} families. A function $f: \{0,1\}^n \to \{0,1\}$ is intersecting (respectively, union-closed) if its set of satisfying assignments corresponds to an intersecting family (respectively, a union-closed family) of subsets of $[n]$.
Our main results are that -- in sharp contrast with the property of being a monotone set system -- the property of being an intersecting set system, and the property of being a union-closed set system, both turn out to be information-theoretically difficult to test. We show that:
$\bullet$ For $ε\geq Ω(1/\sqrt{n})$, any non-adaptive two-sided $ε$-tester for intersectingness must make $2^{Ω(n^{1/4}/\sqrtε)}$ queries. We also give a $2^{Ω(\sqrt{n \log(1/ε)})}$-query lower bound for non-adaptive one-sided $ε$-testers for intersectingness.
$\bullet$ For $ε\geq 1/2^{Ω(n^{0.49})}$, any non-adaptive two-sided $ε$-tester for union-closedness must make $n^{Ω(\log(1/ε))}$ queries.
Thus, neither intersectingness nor union-closedness shares the $\mathrm{poly}(n,1/ε)$-query non-adaptive testability that is enjoyed by monotonicity.
To complement our lower bounds, we also give a simple $\mathrm{poly}(n^{\sqrt{n\log(1/ε)}},1/ε)$-query, one-sided, non-adaptive algorithm for $ε$-testing each of these properties (intersectingness and union-closedness). We thus achieve nearly tight upper and lower bounds for two-sided testing of intersectingness when $ε= Θ(1/\sqrt{n})$, and for one-sided testing of intersectingness when $ε=Θ(1).$
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Gaussian Approximation of Convex Sets by Intersections of Halfspaces
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We study the approximability of general convex sets in $\mathbb{R}^n$ by intersections of halfspaces, where the approximation quality is measured with respect to the standard Gaussian distribution $N(0,I_n)$ and the complexity of an approximation is the number of halfspaces used. While a large body of research has considered the approximation of convex sets by intersections of halfspaces under dis…
▽ More
We study the approximability of general convex sets in $\mathbb{R}^n$ by intersections of halfspaces, where the approximation quality is measured with respect to the standard Gaussian distribution $N(0,I_n)$ and the complexity of an approximation is the number of halfspaces used. While a large body of research has considered the approximation of convex sets by intersections of halfspaces under distance metrics such as the Lebesgue measure and Hausdorff distance, prior to our work there has not been a systematic study of convex approximation under the Gaussian distribution.
We establish a range of upper and lower bounds, both for general convex sets and for specific natural convex sets that are of particular interest. Our results demonstrate that the landscape of approximation is intriguingly different under the Gaussian distribution versus previously studied distance measures. For example, we show that $2^{Θ(\sqrt{n})}$ halfspaces are both necessary and sufficient to approximate the origin-centered $\ell_2$ ball of Gaussian volume 1/2 to any constant accuracy, and that for $1 \leq p < 2$, the origin-centered $\ell_p$ ball of Gaussian volume 1/2 can be approximated to any constant accuracy as an intersection of $2^{\widetilde{O}(n^{3/4})}$ many halfspaces. These bounds are quite different from known approximation results under more commonly studied distance measures.
Our results are proved using techniques from many different areas. These include classical results on convex polyhedral approximation, Cramér-type bounds on large deviations from probability theory, and -- perhaps surprisingly -- a range of topics from computational complexity, including computational learning theory, unconditional pseudorandomness, and the study of influences and noise sensitivity in the analysis of Boolean functions.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Explicit orthogonal and unitary designs
Authors:
Ryan O'Donnell,
Rocco A. Servedio,
Pedro Paredes
Abstract:
We give a strongly explicit construction of $\varepsilon$-approximate $k$-designs for the orthogonal group $\mathrm{O}(N)$ and the unitary group $\mathrm{U}(N)$, for $N=2^n$. Our designs are of cardinality $\mathrm{poly}(N^k/\varepsilon)$ (equivalently, they have seed length $O(nk + \log(1/\varepsilon)))$; up to the polynomial, this matches the number of design elements used by the construction co…
▽ More
We give a strongly explicit construction of $\varepsilon$-approximate $k$-designs for the orthogonal group $\mathrm{O}(N)$ and the unitary group $\mathrm{U}(N)$, for $N=2^n$. Our designs are of cardinality $\mathrm{poly}(N^k/\varepsilon)$ (equivalently, they have seed length $O(nk + \log(1/\varepsilon)))$; up to the polynomial, this matches the number of design elements used by the construction consisting of completely random matrices.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Mildly Exponential Lower Bounds on Tolerant Testers for Monotonicity, Unateness, and Juntas
Authors:
Xi Chen,
Anindya De,
Yuhao Li,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We give the first super-polynomial (in fact, mildly exponential) lower bounds for tolerant testing (equivalently, distance estimation) of monotonicity, unateness, and juntas with a constant separation between the "yes" and "no" cases. Specifically, we give
$\bullet$ A $2^{Ω(n^{1/4}/\sqrt{\varepsilon})}$-query lower bound for non-adaptive, two-sided tolerant monotonicity testers and unateness tes…
▽ More
We give the first super-polynomial (in fact, mildly exponential) lower bounds for tolerant testing (equivalently, distance estimation) of monotonicity, unateness, and juntas with a constant separation between the "yes" and "no" cases. Specifically, we give
$\bullet$ A $2^{Ω(n^{1/4}/\sqrt{\varepsilon})}$-query lower bound for non-adaptive, two-sided tolerant monotonicity testers and unateness testers when the "gap" parameter $\varepsilon_2-\varepsilon_1$ is equal to $\varepsilon$, for any $\varepsilon \geq 1/\sqrt{n}$;
$\bullet$ A $2^{Ω(k^{1/2})}$-query lower bound for non-adaptive, two-sided tolerant junta testers when the gap parameter is an absolute constant.
In the constant-gap regime no non-trivial prior lower bound was known for monotonicity, the best prior lower bound known for unateness was $\tildeΩ(n^{3/2})$ queries, and the best prior lower bound known for juntas was $\mathrm{poly}(k)$ queries.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Testing Convex Truncation
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results,
(1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from…
▽ More
We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results,
(1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary convex set $S$.
(2) We give a different computationally efficient $O(n)$-sample algorithm that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary mixture of symmetric convex sets.
These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially $n^{\sqrt{n}}$ samples. An easy argument shows that no finite number of samples suffices to distinguish $N(0,I_n)$ from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible.
We also prove that any algorithm (computationally efficient or otherwise) that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown symmetric convex set must use $Ω(n)$ samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.
△ Less
Submitted 21 November, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Subset Sum in Time $2^{n/2} / poly(n)$
Authors:
Xi Chen,
Yaonan Jin,
Tim Randolph,
Rocco A. Servedio
Abstract:
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is…
▽ More
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is the first improvement on the classical ``meet-in-the-middle'' algorithm for worst-case Subset Sum, due to Horowitz and Sahni, which can be implemented in time $O(2^{n/2})$ in these memory models.
Our algorithm combines a number of different techniques, including the ``representation method'' introduced by Howgrave-Graham and Joux and subsequent adaptations of the method in Austrin, Kaski, Koivisto, and Nederlof, and Nederlof and Wegrzycki, and ``bit-packing'' techniques used in the work of Baran, Demaine, and Patrascu on subquadratic algorithms for 3SUM.
△ Less
Submitted 29 January, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Approximate Trace Reconstruction from a Single Trace
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
The well-known trace reconstruction problem is the problem of inferring an unknown source string $x \in \{0,1\}^n$ from independent "traces", i.e. copies of $x$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $x$ with probability $δ$ and concatenates the surviving bits. The current paper considers the extreme data-limited regime in which only a single tra…
▽ More
The well-known trace reconstruction problem is the problem of inferring an unknown source string $x \in \{0,1\}^n$ from independent "traces", i.e. copies of $x$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $x$ with probability $δ$ and concatenates the surviving bits. The current paper considers the extreme data-limited regime in which only a single trace is provided to the reconstruction algorithm. In this setting exact reconstruction is of course impossible, and the question is to what accuracy the source string $x$ can be approximately reconstructed.
We give a detailed study of this question, providing algorithms and lower bounds for the high, intermediate, and low deletion rate regimes in both the worst-case ($x$ is arbitrary) and average-case ($x$ is drawn uniformly from $\{0,1\}^n$) models. In several cases the lower bounds we establish are matched by computationally efficient algorithms that we provide.
We highlight our results for the high deletion rate regime: roughly speaking, they show that
- Having access to a single trace is already quite useful for worst-case trace reconstruction: an efficient algorithm can perform much more accurate reconstruction, given one trace that is even only a few bits long, than it could given no traces at all. But in contrast,
- in the average-case setting, having access to a single trace is provably not very useful: no algorithm, computationally efficient or otherwise, can achieve significantly higher accuracy given one trace that is $o(n)$ bits long than it could with no traces.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Near-Optimal Statistical Query Lower Bounds for Agnostically Learning Intersections of Halfspaces with Gaussian Marginals
Authors:
Daniel Hsu,
Clayton Sanford,
Rocco Servedio,
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Abstract:
We consider the well-studied problem of learning intersections of halfspaces under the Gaussian distribution in the challenging \emph{agnostic learning} model. Recent work of Diakonikolas et al. (2021) shows that any Statistical Query (SQ) algorithm for agnostically learning the class of intersections of $k$ halfspaces over $\mathbb{R}^n$ to constant excess error either must make queries of tolera…
▽ More
We consider the well-studied problem of learning intersections of halfspaces under the Gaussian distribution in the challenging \emph{agnostic learning} model. Recent work of Diakonikolas et al. (2021) shows that any Statistical Query (SQ) algorithm for agnostically learning the class of intersections of $k$ halfspaces over $\mathbb{R}^n$ to constant excess error either must make queries of tolerance at most $n^{-\tildeΩ(\sqrt{\log k})}$ or must make $2^{n^{Ω(1)}}$ queries. We strengthen this result by improving the tolerance requirement to $n^{-\tildeΩ(\log k)}$. This lower bound is essentially best possible since an SQ algorithm of Klivans et al. (2008) agnostically learns this class to any constant excess error using $n^{O(\log k)}$ queries of tolerance $n^{-O(\log k)}$. We prove two variants of our lower bound, each of which combines ingredients from Diakonikolas et al. (2021) with (an extension of) a different earlier approach for agnostic SQ lower bounds for the Boolean setting due to Dachman-Soled et al. (2014). Our approach also yields lower bounds for agnostically SQ learning the class of "convex subspace juntas" (studied by Vempala, 2010) and the class of sets with bounded Gaussian surface area; all of these lower bounds are nearly optimal since they essentially match known upper bounds from Klivans et al. (2008).
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
The perils of being unhinged: On the accuracy of classifiers minimizing a noise-robust convex loss
Authors:
Philip M. Long,
Rocco A. Servedio
Abstract:
Van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained by minimizing the unhinged loss, and observe that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a…
▽ More
Van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained by minimizing the unhinged loss, and observe that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a binary classifier with accuracy no better than random guessing.
△ Less
Submitted 4 March, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Average-Case Subset Balancing Problems
Authors:
Xi Chen,
Yaonan Jin,
Tim Randolph,
Rocco A. Servedio
Abstract:
Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$.
Our algorithm generaliz…
▽ More
Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$.
Our algorithm generalizes to a number of related problems, such as the ``Generalized Equal Subset Sum'' problem, which asks us to assign a coefficient $c_i$ from a set $C$ to each input number $x_i$ such that $\sum_{i} c_i x_i = 0$. Our algorithm for the average-case version of this problem runs in~time $|C|^{(0.5-c_0/|C|)n}$ for some positive constant $c_0$, whenever $C=\{0, \pm 1, \dots, \pm d\}$ or $\{\pm 1, \dots, \pm d\}$ for some positive integer $d$ (with $O^*(|C|^{0.45n})$ when $|C|<10$). Our results extend to the~problem of finding ``nearly balanced'' solutions in which the target is a not-too-large nonzero offset $τ$.
Our approach relies on new structural results that characterize the probability that $\sum_{i} c_i x_i$ $=τ$ has a solution $c \in C^n$ when $x_i$'s are chosen randomly; these results may be of independent interest. Our algorithm is inspired by the ``representation technique'' introduced by Howgrave-Graham and Joux. This requires several new ideas to overcome preprocessing hurdles that arise in the representation framework, as well as a novel application of dynamic programming in the solution recovery phase of the algorithm.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Convex Influences
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We introduce a new notion of influence for symmetric convex sets over Gaussian space, which we term "convex influence". We show that this new notion of influence shares many of the familiar properties of influences of variables for monotone Boolean functions $f: \{\pm1\}^n \to \{\pm1\}.$
Our main results for convex influences give Gaussian space analogues of many important results on influences…
▽ More
We introduce a new notion of influence for symmetric convex sets over Gaussian space, which we term "convex influence". We show that this new notion of influence shares many of the familiar properties of influences of variables for monotone Boolean functions $f: \{\pm1\}^n \to \{\pm1\}.$
Our main results for convex influences give Gaussian space analogues of many important results on influences for monotone Boolean functions. These include (robust) characterizations of extremal functions, the Poincaré inequality, the Kahn-Kalai-Linial theorem, a sharp threshold theorem of Kalai, a stability version of the Kruskal-Katona theorem due to O'Donnell and Wimmer, and some partial results towards a Gaussian space analogue of Friedgut's junta theorem. The proofs of our results for convex influences use very different techniques than the analogous proofs for Boolean influences over $\{\pm1\}^n$. Taken as a whole, our results extend the emerging analogy between symmetric convex sets in Gaussian space and monotone Boolean functions from $\{\pm1\}^n$ to $\{\pm1\}$
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Approximating Sumset Size
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
Given a subset $A$ of the $n$-dimensional Boolean hypercube $\mathbb{F}_2^n$, the sumset $A+A$ is the set $\{a+a': a, a' \in A\}$ where addition is in $\mathbb{F}_2^n$. Sumsets play an important role in additive combinatorics, where they feature in many central results of the field.
The main result of this paper is a sublinear-time algorithm for the problem of sumset size estimation. In more det…
▽ More
Given a subset $A$ of the $n$-dimensional Boolean hypercube $\mathbb{F}_2^n$, the sumset $A+A$ is the set $\{a+a': a, a' \in A\}$ where addition is in $\mathbb{F}_2^n$. Sumsets play an important role in additive combinatorics, where they feature in many central results of the field.
The main result of this paper is a sublinear-time algorithm for the problem of sumset size estimation. In more detail, our algorithm is given oracle access to (the indicator function of) an arbitrary $A \subseteq \mathbb{F}_2^n$ and an accuracy parameter $ε> 0$, and with high probability it outputs a value $0 \leq v \leq 1$ that is $\pm ε$-close to $\mathrm{Vol}(A' + A')$ for some perturbation $A' \subseteq A$ of $A$ satisfying $\mathrm{Vol}(A \setminus A') \leq ε.$ It is easy to see that without the relaxation of dealing with $A'$ rather than $A$, any algorithm for estimating $\mathrm{Vol}(A+A)$ to any nontrivial accuracy must make $2^{Ω(n)}$ queries. In contrast, we give an algorithm whose query complexity depends only on $ε$ and is completely independent of the ambient dimension $n$.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Near-Optimal Average-Case Approximate Trace Reconstruction from Few Traces
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
In the standard trace reconstruction problem, the goal is to \emph{exactly} reconstruct an unknown source string $\mathsf{x} \in \{0,1\}^n$ from independent "traces", which are copies of $\mathsf{x}$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $\mathsf{x}$ with probability $δ$ and concatenates the surviving bits. We study the \emph{approximate} trace…
▽ More
In the standard trace reconstruction problem, the goal is to \emph{exactly} reconstruct an unknown source string $\mathsf{x} \in \{0,1\}^n$ from independent "traces", which are copies of $\mathsf{x}$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $\mathsf{x}$ with probability $δ$ and concatenates the surviving bits. We study the \emph{approximate} trace reconstruction problem, in which the goal is only to obtain a high-accuracy approximation of $\mathsf{x}$ rather than an exact reconstruction.
We give an efficient algorithm, and a near-matching lower bound, for approximate reconstruction of a random source string $\mathsf{x} \in \{0,1\}^n$ from few traces. Our main algorithmic result is a polynomial-time algorithm with the following property: for any deletion rate $0 < δ< 1$ (which may depend on $n$), for almost every source string $\mathsf{x} \in \{0,1\}^n$, given any number $M \leq Θ(1/δ)$ of traces from $\mathrm{Del}_δ(\mathsf{x})$, the algorithm constructs a hypothesis string $\widehat{\mathsf{x}}$ that has edit distance at most $n \cdot (δM)^{Ω(M)}$ from $\mathsf{x}$. We also prove a near-matching information-theoretic lower bound showing that given $M \leq Θ(1/δ)$ traces from $\mathrm{Del}_δ(\mathsf{x})$ for a random $n$-bit string $\mathsf{x}$, the smallest possible expected edit distance that any algorithm can achieve, regardless of its running time, is $n \cdot (δM)^{O(M)}$.
△ Less
Submitted 25 August, 2021; v1 submitted 24 July, 2021;
originally announced July 2021.
-
Fourier growth of structured $\mathbb{F}_2$-polynomials and applications
Authors:
Jarosław Błasiok,
Peter Ivanov,
Yaonan Jin,
Chin Ho Lee,
Rocco A. Servedio,
Emanuele Viola
Abstract:
We analyze the Fourier growth, i.e. the $L_1$ Fourier weight at level $k$ (denoted $L_{1,k}$), of various well-studied classes of "structured" $\mathbb{F}_2$-polynomials. This study is motivated by applications in pseudorandomness, in particular recent results and conjectures due to [CHHL19,CHLT19,CGLSS20] which show that upper bounds on Fourier growth (even at level $k=2$) give unconditional pseu…
▽ More
We analyze the Fourier growth, i.e. the $L_1$ Fourier weight at level $k$ (denoted $L_{1,k}$), of various well-studied classes of "structured" $\mathbb{F}_2$-polynomials. This study is motivated by applications in pseudorandomness, in particular recent results and conjectures due to [CHHL19,CHLT19,CGLSS20] which show that upper bounds on Fourier growth (even at level $k=2$) give unconditional pseudorandom generators.
Our main structural results on Fourier growth are as follows:
- We show that any symmetric degree-$d$ $\mathbb{F}_2$-polynomial $p$ has $L_{1,k}(p) \le \Pr[p=1] \cdot O(d)^k$, and this is tight for any constant $k$. This quadratically strengthens an earlier bound that was implicit in [RSV13].
- We show that any read-$Δ$ degree-$d$ $\mathbb{F}_2$-polynomial $p$ has $L_{1,k}(p) \le \Pr[p=1] \cdot (k Δd)^{O(k)}$.
- We establish a composition theorem which gives $L_{1,k}$ bounds on disjoint compositions of functions that are closed under restrictions and admit $L_{1,k}$ bounds.
Finally, we apply the above structural results to obtain new unconditional pseudorandom generators and new correlation bounds for various classes of $\mathbb{F}_2$-polynomials.
△ Less
Submitted 11 October, 2024; v1 submitted 22 July, 2021;
originally announced July 2021.
-
Fooling Gaussian PTFs via Local Hyperconcentration
Authors:
Ryan O'Donnell,
Rocco A. Servedio,
Li-Yang Tan,
Daniel Kane
Abstract:
We give a pseudorandom generator that fools degree-$d$ polynomial threshold functions over $n$-dimensional Gaussian space with seed length $\mathrm{poly}(d)\cdot \log n$. All previous generators had a seed length with at least a $2^d$ dependence on $d$.
The key new ingredient is a Local Hyperconcentration Theorem, which shows that every degree-$d$ Gaussian polynomial is hyperconcentrated almost…
▽ More
We give a pseudorandom generator that fools degree-$d$ polynomial threshold functions over $n$-dimensional Gaussian space with seed length $\mathrm{poly}(d)\cdot \log n$. All previous generators had a seed length with at least a $2^d$ dependence on $d$.
The key new ingredient is a Local Hyperconcentration Theorem, which shows that every degree-$d$ Gaussian polynomial is hyperconcentrated almost everywhere at scale $d^{-O(1)}$.
△ Less
Submitted 9 February, 2022; v1 submitted 13 March, 2021;
originally announced March 2021.
-
On the Approximation Power of Two-Layer Networks of Random ReLUs
Authors:
Daniel Hsu,
Clayton Sanford,
Rocco A. Servedio,
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Abstract:
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ…
▽ More
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ tools from harmonic analysis and ridgelet representation theory, while our lower-bounds are based on (robust versions of) dimensionality arguments.
△ Less
Submitted 7 September, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
Quantitative Correlation Inequalities via Semigroup Interpolation
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
Most correlation inequalities for high-dimensional functions in the literature, such as the Fortuin-Kasteleyn-Ginibre (FKG) inequality and the celebrated Gaussian Correlation Inequality of Royen, are qualitative statements which establish that any two functions of a certain type have non-negative correlation. In this work we give a general approach that can be used to bootstrap many qualitative co…
▽ More
Most correlation inequalities for high-dimensional functions in the literature, such as the Fortuin-Kasteleyn-Ginibre (FKG) inequality and the celebrated Gaussian Correlation Inequality of Royen, are qualitative statements which establish that any two functions of a certain type have non-negative correlation. In this work we give a general approach that can be used to bootstrap many qualitative correlation inequalities for functions over product spaces into quantitative statements. The approach combines a new extremal result about power series, proved using complex analysis, with harmonic analysis of functions over product spaces. We instantiate this general approach in several different concrete settings to obtain a range of new and near-optimal quantitative correlation inequalities, including:
$\bullet$ A quantitative version of Royen's celebrated Gaussian Correlation Inequality. Royen (2014) confirmed a conjecture, open for 40 years, stating that any two symmetric, convex sets must be non-negatively correlated under any centered Gaussian distribution. We give a lower bound on the correlation in terms of the vector of degree-2 Hermite coefficients of the two convex sets, analogous to the correlation bound for monotone Boolean functions over $\{0,1\}^n$ obtained by Talagrand (1996).
$\bullet$ A quantitative version of the well-known FKG inequality for monotone functions over any finite product probability space, generalizing the quantitative correlation bound for monotone Boolean functions over $\{0,1\}^n$ obtained by Talagrand (1996). The only prior generalization of which we are aware is due to Keller (2008, 2009, 2012), which extended Talagrand's result to product distributions over $\{0,1\}^n$. We also give two different quantitative versions of the FKG inequality for monotone functions over the continuous domain $[0,1]^n$, answering a question of Keller (2009).
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Polynomial-time trace reconstruction in the low deletion rate regime
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is transmitted through a probabilistic \emph{deletion channel} which independently deletes each bit with some fixed probability $δ$ and concatenates the surviving bits, resulting in a \emph{trace} of $x$. The problem is to reconstruct $x$ given access to independent traces.
Trace reconstruction of arbitrary (w…
▽ More
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is transmitted through a probabilistic \emph{deletion channel} which independently deletes each bit with some fixed probability $δ$ and concatenates the surviving bits, resulting in a \emph{trace} of $x$. The problem is to reconstruct $x$ given access to independent traces.
Trace reconstruction of arbitrary (worst-case) strings is a challenging problem, with the current state of the art for poly$(n)$-time algorithms being the 2004 algorithm of Batu et al. \cite{BKKM04}. This algorithm can reconstruct an arbitrary source string $x \in \{0,1\}^n$ in poly$(n)$ time provided that the deletion rate $δ$ satisfies $δ\leq n^{-(1/2 + \varepsilon)}$ for some $\varepsilon > 0$.
In this work we improve on the result of \cite{BKKM04} by giving a poly$(n)$-time algorithm for trace reconstruction for any deletion rate $δ\leq n^{-(1/3 + \varepsilon)}$. Our algorithm works by alternating an alignment-based procedure, which we show effectively reconstructs portions of the source string that are not "highly repetitive", with a novel procedure that efficiently determines the length of highly repetitive subwords of the source string.
△ Less
Submitted 7 December, 2020; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Polynomial-time trace reconstruction in the smoothed complexity model
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is sent through a probabilistic \emph{deletion channel} which independently deletes each bit with probability $δ$ and concatenates the surviving bits, yielding a \emph{trace} of $x$. The problem is to reconstruct $x$ given independent traces. This problem has received much attention in recent years both in the w…
▽ More
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is sent through a probabilistic \emph{deletion channel} which independently deletes each bit with probability $δ$ and concatenates the surviving bits, yielding a \emph{trace} of $x$. The problem is to reconstruct $x$ given independent traces. This problem has received much attention in recent years both in the worst-case setting where $x$ may be an arbitrary string in $\{0,1\}^n$ \cite{DOS17,NazarovPeres17,HHP18,HL18,Chase19} and in the average-case setting where $x$ is drawn uniformly at random from $\{0,1\}^n$ \cite{PeresZhai17,HPP18,HL18,Chase19}.
This paper studies trace reconstruction in the \emph{smoothed analysis} setting, in which a ``worst-case'' string $x^{\worst}$ is chosen arbitrarily from $\{0,1\}^n$, and then a perturbed version $\bx$ of $x^{\worst}$ is formed by independently replacing each coordinate by a uniform random bit with probability $σ$. The problem is to reconstruct $\bx$ given independent traces from it.
Our main result is an algorithm which, for any constant perturbation rate $0<σ< 1$ and any constant deletion rate $0 < δ< 1$, uses $\poly(n)$ running time and traces and succeeds with high probability in reconstructing the string $\bx$. This stands in contrast with the worst-case version of the problem, for which $\text{exp}(O(n^{1/3}))$ is the best known time and sample complexity \cite{DOS17,NazarovPeres17}.
Our approach is based on reconstructing $\bx$ from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of our work is a new $\poly(n)$-time procedure for reconstructing the multiset of all $O(\log n)$-length subwords of any source string $x\in \{0,1\}^n$ given access to traces of $x$.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Reconstructing weighted voting schemes from partial information about their power indices
Authors:
Huck Bennett,
Anindya De,
Rocco A. Servedio,
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Abstract:
A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In…
▽ More
A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In the language of theoretical computer science, this is the problem of approximating an unknown linear threshold function (LTF) over $\{-1, 1\}^n$ given some numerical measure (such as the function's $n$ ``Chow parameters,'' a.k.a. its degree-1 Fourier coefficients, or the vector of its $n$ Shapley indices) of how much each of the $n$ individual input variables affects the outcome of the function.
In this paper we consider the problem of reconstructing an LTF given only partial information about its Chow parameters or Shapley indices; i.e. we are given only the Chow parameters or the Shapley indices corresponding to a subset $S \subseteq [n]$ of the $n$ input variables. A natural goal in this partial information setting is to find an LTF whose Chow parameters or Shapley indices corresponding to indices in $S$ accurately match the given Chow parameters or Shapley indices of the unknown LTF. We refer to this as the Partial Inverse Power Index Problem.
Our main results are a polynomial time algorithm for the ($\varepsilon$-approximate) Chow Parameters Partial Inverse Power Index Problem and a quasi-polynomial time algorithm for the ($\varepsilon$-approximate) Shapley Indices Partial Inverse Power Index Problem.
△ Less
Submitted 26 July, 2020; v1 submitted 19 July, 2020;
originally announced July 2020.
-
Testing noisy linear functions for sparsity
Authors:
Xue Chen,
Anindya De,
Rocco A. Servedio
Abstract:
We consider the following basic inference problem: there is an unknown high-dimensional vector $w \in \mathbb{R}^n$, and an algorithm is given access to labeled pairs $(x,y)$ where $x \in \mathbb{R}^n$ is a measurement and $y = w \cdot x + \mathrm{noise}$. What is the complexity of deciding whether the target vector $w$ is (approximately) $k$-sparse? The recovery analogue of this problem --- given…
▽ More
We consider the following basic inference problem: there is an unknown high-dimensional vector $w \in \mathbb{R}^n$, and an algorithm is given access to labeled pairs $(x,y)$ where $x \in \mathbb{R}^n$ is a measurement and $y = w \cdot x + \mathrm{noise}$. What is the complexity of deciding whether the target vector $w$ is (approximately) $k$-sparse? The recovery analogue of this problem --- given the promise that $w$ is sparse, find or approximate the vector $w$ --- is the famous sparse recovery problem, with a rich body of work in signal processing, statistics, and computer science.
We study the decision version of this problem (i.e.~deciding whether the unknown $w$ is $k$-sparse) from the vantage point of property testing. Our focus is on answering the following high-level question: when is it possible to efficiently test whether the unknown target vector $w$ is sparse versus far-from-sparse using a number of samples which is completely independent of the dimension $n$? We consider the natural setting in which $x$ is drawn from a i.i.d.~product distribution $\mathcal{D}$ over $\mathbb{R}^n$ and the $\mathrm{noise}$ process is independent of the input $x$. As our main result, we give a general algorithm which solves the above-described testing problem using a number of samples which is completely independent of the ambient dimension $n$, as long as $\mathcal{D}$ is not a Gaussian. In fact, our algorithm is fully noise tolerant, in the sense that for an arbitrary $w$, it approximately computes the distance of $w$ to the closest $k$-sparse vector. To complement this algorithmic result, we show that weakening any of our condition makes it information-theoretically impossible for any algorithm to solve the testing problem with fewer than essentially $\log n$ samples.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Kruskal-Katona for convex sets, with applications
Authors:
Anindya De,
Rocco A. Servedio
Abstract:
The well-known Kruskal-Katona theorem in combinatorics says that (under mild conditions) every monotone Boolean function $f: \{0,1\}^n \to \{0,1\}$ has a nontrivial "density increment." This means that the fraction of inputs of Hamming weight $k+1$ for which $f=1$ is significantly larger than the fraction of inputs of Hamming weight $k$ for which $f=1.$
We prove an analogous statement for convex…
▽ More
The well-known Kruskal-Katona theorem in combinatorics says that (under mild conditions) every monotone Boolean function $f: \{0,1\}^n \to \{0,1\}$ has a nontrivial "density increment." This means that the fraction of inputs of Hamming weight $k+1$ for which $f=1$ is significantly larger than the fraction of inputs of Hamming weight $k$ for which $f=1.$
We prove an analogous statement for convex sets. Informally, our main result says that (under mild conditions) every convex set $K \subset \mathbb{R}^n$ has a nontrivial density increment. This means that the fraction of the radius-$r$ sphere that lies within $K$ is significantly larger than the fraction of the radius-$r'$ sphere that lies within $K$, for $r'$ suitably larger than $r$. For centrally symmetric convex sets we show that our density increment result is essentially optimal.
As a consequence of our Kruskal-Katona type theorem, we obtain the first efficient weak learning algorithm for convex sets under the Gaussian distribution. We show that any convex set can be weak learned to advantage $Ω(1/n)$ in $\mathsf{poly}(n)$ time under any Gaussian distribution and that any centrally symmetric convex set can be weak learned to advantage $Ω(1/\sqrt{n})$ in $\mathsf{poly}(n)$ time. We also give an information-theoretic lower bound showing that the latter advantage is essentially optimal for $\mathsf{poly}(n)$ time weak learning algorithms. As another consequence of our Kruskal-Katona theorem, we give the first nontrivial Gaussian noise stability bounds for convex sets at high noise rates. Our results extend the known correspondence between monotone Boolean functions over $ \{0,1\}^n$ and convex bodies in Gaussian space.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
A Lower Bound on Cycle-Finding in Sparse Digraphs
Authors:
Xi Chen,
Tim Randolph,
Rocco A. Servedio,
Timothy Sun
Abstract:
We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the…
▽ More
We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the language of property testing, our result is an $\tildeΩ(N^{5/9})$ lower bound on the query complexity of one-sided algorithms for testing whether sparse digraphs with constant outdegree are far from acyclic. This is the first improvement on the $Ω(\sqrt{N})$ lower bound, implicit in Bender and Ron, which follows from a simple birthday paradox argument.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
Efficient average-case population recovery in the presence of insertions and deletions
Authors:
Frank Ban,
Xi Chen,
Rocco A. Servedio,
Sandip Sinha
Abstract:
Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x\in\{0,1\}^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the best algorithms known for worst-cas…
▽ More
Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x\in\{0,1\}^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the best algorithms known for worst-case strings use $\exp(O(n^{1/3}))$ traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which $x$ is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution $\cal{D}$ over $s$ unknown source strings $x^1,\dots,x^s \in \{0,1\}^n$, and each sample is independently generated by drawing some $x^i$ from $\cal{D}$ and returning an independent trace of $x^i$.
Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size $s \leq \smash{\exp(Θ(n^{1/3}))}$, for a $1-o(1)$ fraction of all $s$-element support sets $\{x^1,\dots,x^s\} \subset \{0,1\}^n$, for every distribution $\cal{D}$ supported on $\{x^1,\dots,x^s\}$, our algorithm efficiently recovers ${\cal D}$ up to total variation distance $ε$ with high probability, given access to independent traces of independent draws from $\cal{D}$. The algorithm runs in time poly$(n,s,1/ε)$ and its sample complexity is poly$(s,1/ε,\exp(\log^{1/3}n)).$ This polynomial dependence on the support size $s$ is in sharp contrast with the \emph{worst-case} version (when $x^1,\dots,x^s$ may be any strings in $\{0,1\}^n$), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in $s$.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Learning from satisfying assignments under continuous distributions
Authors:
Clément L. Canonne,
Anindya De,
Rocco A. Servedio
Abstract:
What kinds of functions are learnable from their satisfying assignments? Motivated by this simple question, we extend the framework of De, Diakonikolas, and Servedio [DDS15], which studied the learnability of probability distributions over $\{0,1\}^n$ defined by the set of satisfying assignments to "low-complexity" Boolean functions, to Boolean-valued functions defined over continuous domains. In…
▽ More
What kinds of functions are learnable from their satisfying assignments? Motivated by this simple question, we extend the framework of De, Diakonikolas, and Servedio [DDS15], which studied the learnability of probability distributions over $\{0,1\}^n$ defined by the set of satisfying assignments to "low-complexity" Boolean functions, to Boolean-valued functions defined over continuous domains. In our learning scenario there is a known "background distribution" $\mathcal{D}$ over $\mathbb{R}^n$ (such as a known normal distribution or a known log-concave distribution) and the learner is given i.i.d. samples drawn from a target distribution $\mathcal{D}_f$, where $\mathcal{D}_f$ is $\mathcal{D}$ restricted to the satisfying assignments of an unknown low-complexity Boolean-valued function $f$. The problem is to learn an approximation $\mathcal{D}'$ of the target distribution $\mathcal{D}_f$ which has small error as measured in total variation distance.
We give a range of efficient algorithms and hardness results for this problem, focusing on the case when $f$ is a low-degree polynomial threshold function (PTF). When the background distribution $\mathcal{D}$ is log-concave, we show that this learning problem is efficiently solvable for degree-1 PTFs (i.e.,~linear threshold functions) but not for degree-2 PTFs. In contrast, when $\mathcal{D}$ is a normal distribution, we show that this learning problem is efficiently solvable for degree-2 PTFs but not for degree-4 PTFs. Our hardness results rely on standard assumptions about secure signature schemes.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Beyond trace reconstruction: Population recovery from the deletion channel
Authors:
Frank Ban,
Xi Chen,
Adam Freilich,
Rocco A. Servedio,
Sandip Sinha
Abstract:
\emph{Population recovery} is the problem of learning an unknown distribution over an unknown set of $n$-bit strings, given access to independent draws from the distribution that have been independently corrupted according to some noise channel. Recent work has intensively studied such problems both for the bit-flip and erasure noise channels.
We initiate the study of population recovery under t…
▽ More
\emph{Population recovery} is the problem of learning an unknown distribution over an unknown set of $n$-bit strings, given access to independent draws from the distribution that have been independently corrupted according to some noise channel. Recent work has intensively studied such problems both for the bit-flip and erasure noise channels.
We initiate the study of population recovery under the \emph{deletion channel}, in which each bit is independently \emph{deleted} with some fixed probability and the surviving bits are concatenated and transmitted. This is a far more challenging noise model than bit-flip~noise or erasure noise; indeed, even the simplest case in which the population size is 1 (corresponding to a trivial distribution supported on a single string) corresponds to the \emph{trace reconstruction} problem, a challenging problem that has received much recent attention (see e.g.~\cite{DOS17,NP17,PZ17,HPP18,HHP18}).
We give algorithms and lower bounds for population recovery under the deletion channel when the population size is some $\ell>1$. As our main sample complexity upper bound, we show that for any $\ell=o(\log n/\log \log n)$, a population of $\ell$ strings from $\{0,1\}^n$ can be learned under deletion channel noise using $\smash{2^{n^{1/2+o(1)}}}$ samples. On the lower bound side, we show that $n^{Ω(\ell)}$ samples are required to perform population recovery under the deletion channel, for all $\ell \leq n^{1/2-ε}$.
Our upper bounds are obtained via a robust multivariate generalization of a polynomial-based analysis, due to Krasikov and Roddity \cite{KR97}, of how the $k$-deck of a bit-string uniquely identifies the string; this is a very different approach from recent algorithms for trace reconstruction (the $\ell=1$ case). Our lower bounds build on moment-matching results of Roos~\cite{Roo00} and Daskalakis and Papadimitriou~\cite{DP15}.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
Density estimation for shift-invariant multidimensional distributions
Authors:
Anindya De,
Philip M. Long,
Rocco A. Servedio
Abstract:
We study density estimation for classes of shift-invariant distributions over $\mathbb{R}^d$. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classe…
▽ More
We study density estimation for classes of shift-invariant distributions over $\mathbb{R}^d$. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classes of distributions that we consider correspond to different rates of tail decay.
For each such class we give an efficient algorithm that learns any distribution in the class from independent samples with respect to total variation distance. As a special case of our general result, we show that $d$-dimensional shift-invariant distributions which satisfy an exponential tail bound can be learned to total variation distance error $ε$ using $\tilde{O}_d(1/ ε^{d+2})$ examples and $\tilde{O}_d(1/ ε^{2d+2})$ time. This implies that, for constant $d$, multivariate log-concave distributions can be learned in $\tilde{O}_d(1/ε^{2d+2})$ time using $\tilde{O}_d(1/ε^{d+2})$ samples, answering a question of [Diakonikolas, Kane and Stewart, 2016] All of our results extend to a model of noise-tolerant density estimation using Huber's contamination model, in which the target distribution to be learned is a $(1-ε,ε)$ mixture of some unknown distribution in the class with some other arbitrary and unknown distribution, and the learning algorithm must output a hypothesis distribution with total variation distance error $O(ε)$ from the target distribution. We show that our general results are close to best possible by proving a simple $Ω\left(1/ε^d\right)$ information-theoretic lower bound on sample complexity even for learning bounded distributions that are shift-invariant.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.
-
Learning sparse mixtures of rankings from noisy information
Authors:
Anindya De,
Ryan O'Donnell,
Rocco Servedio
Abstract:
We study the problem of learning an unknown mixture of $k$ rankings over $n$ elements, given access to noisy samples drawn from the unknown mixture. We consider a range of different noise models, including natural variants of the "heat kernel" noise framework and the Mallows model. For each of these noise models we give an algorithm which, under mild assumptions, learns the unknown mixture to high…
▽ More
We study the problem of learning an unknown mixture of $k$ rankings over $n$ elements, given access to noisy samples drawn from the unknown mixture. We consider a range of different noise models, including natural variants of the "heat kernel" noise framework and the Mallows model. For each of these noise models we give an algorithm which, under mild assumptions, learns the unknown mixture to high accuracy and runs in $n^{O(\log k)}$ time. The best previous algorithms for closely related problems have running times which are exponential in $k$.
△ Less
Submitted 3 November, 2018;
originally announced November 2018.
-
Fooling Polytopes
Authors:
Ryan O'Donnell,
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We give a pseudorandom generator that fools $m$-facet polytopes over $\{0,1\}^n$ with seed length $\mathrm{polylog}(m) \cdot \log n$. The previous best seed length had superlinear dependence on $m$. An immediate consequence is a deterministic quasipolynomial time algorithm for approximating the number of solutions to any $\{0,1\}$-integer program.
We give a pseudorandom generator that fools $m$-facet polytopes over $\{0,1\}^n$ with seed length $\mathrm{polylog}(m) \cdot \log n$. The previous best seed length had superlinear dependence on $m$. An immediate consequence is a deterministic quasipolynomial time algorithm for approximating the number of solutions to any $\{0,1\}$-integer program.
△ Less
Submitted 12 August, 2018;
originally announced August 2018.
-
Learning Sums of Independent Random Variables with Sparse Collective Support
Authors:
Anindya De,
Philip M. Long,
Rocco A. Servedio
Abstract:
We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For $\mathcal{A} \subset \mathbf{Z}_{+}$, a sum of independent random variables with collective support $\mathcal{A}$} (called an $\mathcal{A}$-sum in this paper) is a distribution $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_N$ where the $\mathbf{X}_i$'s are mutu…
▽ More
We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For $\mathcal{A} \subset \mathbf{Z}_{+}$, a sum of independent random variables with collective support $\mathcal{A}$} (called an $\mathcal{A}$-sum in this paper) is a distribution $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_N$ where the $\mathbf{X}_i$'s are mutually independent (but not necessarily identically distributed) integer random variables with $\cup_i \mathsf{supp}(\mathbf{X}_i) \subseteq \mathcal{A}.$ We give two main algorithmic results for learning such distributions:
1. For the case $| \mathcal{A} | = 3$, we give an algorithm for learning $\mathcal{A}$-sums to accuracy $ε$ that uses $\mathsf{poly}(1/ε)$ samples and runs in time $\mathsf{poly}(1/ε)$, independent of $N$ and of the elements of $\mathcal{A}$.
2. For an arbitrary constant $k \geq 4$, if $\mathcal{A} = \{ a_1,...,a_k\}$ with $0 \leq a_1 < ... < a_k$, we give an algorithm that uses $\mathsf{poly}(1/ε) \cdot \log \log a_k$ samples (independent of $N$) and runs in time $\mathsf{poly}(1/ε, \log a_k).$
We prove an essentially matching lower bound: if $|\mathcal{A}| = 4$, then any algorithm must use $Ω(\log \log a_4) $ samples even for learning to constant accuracy. We also give similar-in-spirit (but quantitatively very different) algorithmic results, and essentially matching lower bounds, for the case in which $\mathcal{A}$ is not known to the learner.
△ Less
Submitted 12 November, 2020; v1 submitted 18 July, 2018;
originally announced July 2018.
-
Luby--Veličković--Wigderson revisited: Improved correlation bounds and pseudorandom generators for depth-two circuits
Authors:
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We study correlation bounds and pseudorandom generators for depth-two circuits that consist of a $\mathsf{SYM}$-gate (computing an arbitrary symmetric function) or $\mathsf{THR}$-gate (computing an arbitrary linear threshold function) that is fed by $S$ $\mathsf{AND}$ gates. Such circuits were considered in early influential work on unconditional derandomization of Luby, Veličković, and Wigderson…
▽ More
We study correlation bounds and pseudorandom generators for depth-two circuits that consist of a $\mathsf{SYM}$-gate (computing an arbitrary symmetric function) or $\mathsf{THR}$-gate (computing an arbitrary linear threshold function) that is fed by $S$ $\mathsf{AND}$ gates. Such circuits were considered in early influential work on unconditional derandomization of Luby, Veličković, and Wigderson [LVW93], who gave the first non-trivial PRG with seed length $2^{O(\sqrt{\log(S/\varepsilon)})}$ that $\varepsilon$-fools these circuits.
In this work we obtain the first strict improvement of [LVW93]'s seed length: we construct a PRG that $\varepsilon$-fools size-$S$ $\{\mathsf{SYM},\mathsf{THR}\} \circ\mathsf{AND}$ circuits over $\{0,1\}^n$ with seed length \[ 2^{O(\sqrt{\log S })} + \mathrm{polylog}(1/\varepsilon), \] an exponential (and near-optimal) improvement of the $\varepsilon$-dependence of [LVW93]. The above PRG is actually a special case of a more general PRG which we establish for constant-depth circuits containing multiple $\mathsf{SYM}$ or $\mathsf{THR}$ gates, including as a special case $\{\mathsf{SYM},\mathsf{THR}\} \circ \mathsf{AC^0}$ circuits. These more general results strengthen previous results of Viola [Vio06] and essentially strengthen more recent results of Lovett and Srinivasan [LS11].
Our improved PRGs follow from improved correlation bounds, which are transformed into PRGs via the Nisan--Wigderson "hardness versus randomness" paradigm [NW94]. The key to our improved correlation bounds is the use of a recent powerful \emph{multi-switching} lemma due to Håstad [Hås14].
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Distribution-free Junta Testing
Authors:
Xi Chen,
Zhengyang Liu,
Rocco A. Servedio,
Ying Sheng,
Jinyu Xie
Abstract:
We study the problem of testing whether an unknown $n$-variable Boolean function is a $k$-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over $\{0,1\}^n$. Our first main result is that distribution-free $k$-junta testing can be performed, with one-sided error, by an adaptive a…
▽ More
We study the problem of testing whether an unknown $n$-variable Boolean function is a $k$-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over $\{0,1\}^n$. Our first main result is that distribution-free $k$-junta testing can be performed, with one-sided error, by an adaptive algorithm that uses $\tilde{O}(k^2)/ε$ queries (independent of $n$). Complementing this, our second main result is a lower bound showing that any non-adaptive distribution-free $k$-junta testing algorithm must make $Ω(2^{k/3})$ queries even to test to accuracy $ε=1/3$. These bounds establish that while the optimal query complexity of non-adaptive $k$-junta testing is $2^{Θ(k)}$, for adaptive testing it is $\text{poly}(k)$, and thus show that adaptivity provides an exponential improvement in the distribution-free query complexity of testing juntas.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
Improved pseudorandom generators from pseudorandom multi-switching lemmas
Authors:
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We give the best known pseudorandom generators for two touchstone classes in unconditional derandomization: an $\varepsilon$-PRG for the class of size-$M$ depth-$d$ $\mathsf{AC}^0$ circuits with seed length $\log(M)^{d+O(1)}\cdot \log(1/\varepsilon)$, and an $\varepsilon$-PRG for the class of $S$-sparse $\mathbb{F}_2$ polynomials with seed length $2^{O(\sqrt{\log S})}\cdot \log(1/\varepsilon)$. Th…
▽ More
We give the best known pseudorandom generators for two touchstone classes in unconditional derandomization: an $\varepsilon$-PRG for the class of size-$M$ depth-$d$ $\mathsf{AC}^0$ circuits with seed length $\log(M)^{d+O(1)}\cdot \log(1/\varepsilon)$, and an $\varepsilon$-PRG for the class of $S$-sparse $\mathbb{F}_2$ polynomials with seed length $2^{O(\sqrt{\log S})}\cdot \log(1/\varepsilon)$. These results bring the state of the art for unconditional derandomization of these classes into sharp alignment with the state of the art for computational hardness for all parameter settings: improving on the seed lengths of either PRG would require breakthrough progress on longstanding and notorious circuit lower bounds.
The key enabling ingredient in our approach is a new \emph{pseudorandom multi-switching lemma}. We derandomize recently-developed \emph{multi}-switching lemmas, which are powerful generalizations of Håstad's switching lemma that deal with \emph{families} of depth-two circuits. Our pseudorandom multi-switching lemma---a randomness-efficient algorithm for sampling restrictions that simultaneously simplify all circuits in a family---achieves the parameters obtained by the (full randomness) multi-switching lemmas of Impagliazzo, Matthews, and Paturi [IMP12] and Håstad [Hås14]. This optimality of our derandomization translates into the optimality (given current circuit lower bounds) of our PRGs for $\mathsf{AC}^0$ and sparse $\mathbb{F}_2$ polynomials.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Deterministic search for CNF satisfying assignments in almost polynomial time
Authors:
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We consider the fundamental derandomization problem of deterministically finding a satisfying assignment to a CNF formula that has many satisfying assignments. We give a deterministic algorithm which, given an $n$-variable $\mathrm{poly}(n)$-clause CNF formula $F$ that has at least $\varepsilon 2^n$ satisfying assignments, runs in time \[ n^{\tilde{O}(\log\log n)^2} \] for…
▽ More
We consider the fundamental derandomization problem of deterministically finding a satisfying assignment to a CNF formula that has many satisfying assignments. We give a deterministic algorithm which, given an $n$-variable $\mathrm{poly}(n)$-clause CNF formula $F$ that has at least $\varepsilon 2^n$ satisfying assignments, runs in time \[ n^{\tilde{O}(\log\log n)^2} \] for $\varepsilon \ge 1/\mathrm{polylog}(n)$ and outputs a satisfying assignment of $F$. Prior to our work the fastest known algorithm for this problem was simply to enumerate over all seeds of a pseudorandom generator for CNFs; using the best known PRGs for CNFs [DETT10], this takes time $n^{\tildeΩ(\log n)}$ even for constant $\varepsilon$. Our approach is based on a new general framework relating deterministic search and deterministic approximate counting, which we believe may find further applications.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Sample-based high-dimensional convexity testing
Authors:
Xi Chen,
Adam Freilich,
Rocco A. Servedio,
Timothy Sun
Abstract:
In the problem of high-dimensional convexity testing, there is an unknown set $S \subseteq \mathbb{R}^n$ which is promised to be either convex or $\varepsilon$-far from every convex body with respect to the standard multivariate normal distribution $\mathcal{N}(0, 1)^n$. The job of a testing algorithm is then to distinguish between these two cases while making as few inspections of the set $S$ as…
▽ More
In the problem of high-dimensional convexity testing, there is an unknown set $S \subseteq \mathbb{R}^n$ which is promised to be either convex or $\varepsilon$-far from every convex body with respect to the standard multivariate normal distribution $\mathcal{N}(0, 1)^n$. The job of a testing algorithm is then to distinguish between these two cases while making as few inspections of the set $S$ as possible.
In this work we consider sample-based testing algorithms, in which the testing algorithm only has access to labeled samples $(\boldsymbol{x},S(\boldsymbol{x}))$ where each $\boldsymbol{x}$ is independently drawn from $\mathcal{N}(0, 1)^n$. We give nearly matching sample complexity upper and lower bounds for both one-sided and two-sided convexity testing algorithms in this framework. For constant $\varepsilon$, our results show that the sample complexity of one-sided convexity testing is $2^{\tildeΘ(n)}$ samples, while for two-sided convexity testing it is $2^{\tildeΘ(\sqrt{n})}$.
△ Less
Submitted 28 June, 2017;
originally announced June 2017.
-
Adaptivity is exponentially powerful for testing monotonicity of halfspaces
Authors:
Xi Chen,
Rocco A. Servedio,
Li-Yang Tan,
Erik Waingarten
Abstract:
We give a $\mathrm{poly}(\log n, 1/ε)$-query adaptive algorithm for testing whether an unknown Boolean function $f: \{-1,1\}^n \to \{-1,1\}$, which is promised to be a halfspace, is monotone versus $ε$-far from monotone. Since non-adaptive algorithms are known to require almost $Ω(n^{1/2})$ queries to test whether an unknown halfspace is monotone versus far from monotone, this shows that adaptivit…
▽ More
We give a $\mathrm{poly}(\log n, 1/ε)$-query adaptive algorithm for testing whether an unknown Boolean function $f: \{-1,1\}^n \to \{-1,1\}$, which is promised to be a halfspace, is monotone versus $ε$-far from monotone. Since non-adaptive algorithms are known to require almost $Ω(n^{1/2})$ queries to test whether an unknown halfspace is monotone versus far from monotone, this shows that adaptivity enables an exponential improvement in the query complexity of monotonicity testing for halfspaces.
△ Less
Submitted 17 June, 2017;
originally announced June 2017.
-
Settling the query complexity of non-adaptive junta testing
Authors:
Xi Chen,
Rocco A. Servedio,
Li-Yang Tan,
Erik Waingarten,
Jinyu Xie
Abstract:
We prove that any non-adaptive algorithm that tests whether an unknown Boolean function $f: \{0, 1\}^n\to \{0, 1\}$ is a $k$-junta or $ε$-far from every $k$-junta must make $\widetildeΩ(k^{3/2} / ε)$ many queries for a wide range of parameters $k$ and $ε$. Our result dramatically improves previous lower bounds from [BGSMdW13, STW15], and is essentially optimal given Blais's non-adaptive junta test…
▽ More
We prove that any non-adaptive algorithm that tests whether an unknown Boolean function $f: \{0, 1\}^n\to \{0, 1\}$ is a $k$-junta or $ε$-far from every $k$-junta must make $\widetildeΩ(k^{3/2} / ε)$ many queries for a wide range of parameters $k$ and $ε$. Our result dramatically improves previous lower bounds from [BGSMdW13, STW15], and is essentially optimal given Blais's non-adaptive junta tester from [Blais08], which makes $\widetilde{O}(k^{3/2})/ε$ queries. Combined with the adaptive tester of [Blais09] which makes $O(k\log k + k /ε)$ queries, our result shows that adaptivity enables polynomial savings in query complexity for junta testing.
△ Less
Submitted 20 April, 2017;
originally announced April 2017.
-
Fooling intersections of low-weight halfspaces
Authors:
Rocco A. Servedio,
Li-Yang Tan
Abstract:
A weight-$t$ halfspace is a Boolean function $f(x)=$sign$(w_1 x_1 + \cdots + w_n x_n - θ)$ where each $w_i$ is an integer in $\{-t,\dots,t\}.$ We give an explicit pseudorandom generator that $δ$-fools any intersection of $k$ weight-$t$ halfspaces with seed length poly$(\log n, \log k,t,1/δ)$. In particular, our result gives an explicit PRG that fools any intersection of any quasipoly$(n)$ number o…
▽ More
A weight-$t$ halfspace is a Boolean function $f(x)=$sign$(w_1 x_1 + \cdots + w_n x_n - θ)$ where each $w_i$ is an integer in $\{-t,\dots,t\}.$ We give an explicit pseudorandom generator that $δ$-fools any intersection of $k$ weight-$t$ halfspaces with seed length poly$(\log n, \log k,t,1/δ)$. In particular, our result gives an explicit PRG that fools any intersection of any quasipoly$(n)$ number of halfspaces of any poly$\log(n)$ weight to any $1/$poly$\log(n)$ accuracy using seed length poly$\log(n).$ Prior to this work no explicit PRG with non-trivial seed length was known even for fooling intersections of $n$ weight-1 halfspaces to constant accuracy.
The analysis of our PRG fuses techniques from two different lines of work on unconditional pseudorandomness for different kinds of Boolean functions. We extend the approach of Harsha, Klivans and Meka \cite{HKM12} for fooling intersections of regular halfspaces, and combine this approach with results of Bazzi \cite{Bazzi:07} and Razborov \cite{Razborov:09} on bounded independence fooling CNF formulas. Our analysis introduces new coupling-based ingredients into the standard Lindeberg method for establishing quantitative central limit theorems and associated pseudorandomness results.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.