Search | arXiv e-print repository

arXiv:2410.19092 [pdf, other]

Provable Tempered Overfitting of Minimal Nets and Typical Nets

Authors: Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, Daniel Soudry

Abstract: We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circui… ▽ More We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a random interpolating NN. For both learning rules, we prove overfitting is tempered. Our analysis rests on a new bound on the size of a threshold circuit consistent with a partial function. To the best of our knowledge, ours are the first theoretical results on benign or tempered overfitting that: (1) apply to deep NNs, and (2) do not require a very high or very low input dimension. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 60 pages, 4 figures

arXiv:1711.00565 [pdf, ps, other]

Typically-Correct Derandomization for Small Time and Space

Authors: William M. Hoza

Abstract: Suppose a language $L$ can be decided by a bounded-error randomized algorithm that runs in space $S$ and time $n \cdot \text{poly}(S)$. We give a randomized algorithm for $L$ that still runs in space $O(S)$ and time $n \cdot \text{poly}(S)$ that uses only $O(S)$ random bits; our algorithm has a low failure probability on all but a negligible fraction of inputs of each length. An immediate corollar… ▽ More Suppose a language $L$ can be decided by a bounded-error randomized algorithm that runs in space $S$ and time $n \cdot \text{poly}(S)$. We give a randomized algorithm for $L$ that still runs in space $O(S)$ and time $n \cdot \text{poly}(S)$ that uses only $O(S)$ random bits; our algorithm has a low failure probability on all but a negligible fraction of inputs of each length. An immediate corollary is a deterministic algorithm for $L$ that runs in space $O(S)$ and succeeds on all but a negligible fraction of inputs of each length. We also give several other complexity-theoretic applications of our technique. △ Less

Submitted 15 May, 2019; v1 submitted 1 November, 2017; originally announced November 2017.

Comments: 39 pages, 9 figures. Improved presentation, simplified content

arXiv:1703.07768 [pdf, ps, other]

Quantum Communication-Query Tradeoffs

Authors: William M. Hoza

Abstract: For any function $f: X \times Y \to Z$, we prove that $Q^{*\text{cc}}(f) \cdot Q^{\text{OIP}}(f) \cdot (\log Q^{\text{OIP}}(f) + \log |Z|) \geq Ω(\log |X|)$. Here, $Q^{*\text{cc}}(f)$ denotes the bounded-error communication complexity of $f$ using an entanglement-assisted two-way qubit channel, and $Q^{\text{OIP}}(f)$ denotes the number of quantum queries needed to learn $x$ with high probability… ▽ More For any function $f: X \times Y \to Z$, we prove that $Q^{*\text{cc}}(f) \cdot Q^{\text{OIP}}(f) \cdot (\log Q^{\text{OIP}}(f) + \log |Z|) \geq Ω(\log |X|)$. Here, $Q^{*\text{cc}}(f)$ denotes the bounded-error communication complexity of $f$ using an entanglement-assisted two-way qubit channel, and $Q^{\text{OIP}}(f)$ denotes the number of quantum queries needed to learn $x$ with high probability given oracle access to the function $f_x(y) \stackrel{\text{def}}{=} f(x, y)$. We show that this tradeoff is close to the best possible. We also give a generalization of this tradeoff for distributional query complexity. As an application, we prove an optimal $Ω(\log q)$ lower bound on the $Q^{*\text{cc}}$ complexity of determining whether $x + y$ is a perfect square, where Alice holds $x \in \mathbf{F}_q$, Bob holds $y \in \mathbf{F}_q$, and $\mathbf{F}_q$ is a finite field of odd characteristic. As another application, we give a new, simpler proof that searching an ordered size-$N$ database requires $Ω(\log N / \log \log N)$ quantum queries. (It was already known that $Θ(\log N)$ queries are required.) △ Less

Submitted 6 September, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

Comments: 20 pages, 3 figures. Strengthened the results in Section 5, fixed small mistakes, improved presentation

arXiv:1611.00783 [pdf, ps, other]

Preserving Randomness for Adaptive Algorithms

Authors: William M. Hoza, Adam R. Klivans

Abstract: Suppose $\mathsf{Est}$ is a randomized estimation algorithm that uses $n$ random bits and outputs values in $\mathbb{R}^d$. We show how to execute $\mathsf{Est}$ on $k$ adaptively chosen inputs using only $n + O(k \log(d + 1))$ random bits instead of the trivial $nk$ (at the cost of mild increases in the error and failure probability). Our algorithm combines a variant of the INW pseudorandom gener… ▽ More Suppose $\mathsf{Est}$ is a randomized estimation algorithm that uses $n$ random bits and outputs values in $\mathbb{R}^d$. We show how to execute $\mathsf{Est}$ on $k$ adaptively chosen inputs using only $n + O(k \log(d + 1))$ random bits instead of the trivial $nk$ (at the cost of mild increases in the error and failure probability). Our algorithm combines a variant of the INW pseudorandom generator (STOC '94) with a new scheme for shifting and rounding the outputs of $\mathsf{Est}$. We prove that modifying the outputs of $\mathsf{Est}$ is necessary in this setting, and furthermore, our algorithm's randomness complexity is near-optimal in the case $d \leq O(1)$. As an application, we give a randomness-efficient version of the Goldreich-Levin algorithm; our algorithm finds all Fourier coefficients with absolute value at least $θ$ of a function $F: \{0, 1\}^n \to \{-1, 1\}$ using $O(n \log n) \cdot \text{poly}(1/θ)$ queries to $F$ and $O(n)$ random bits (independent of $θ$), improving previous work by Bshouty et al. (JCSS '04). △ Less

Submitted 13 June, 2018; v1 submitted 2 November, 2016; originally announced November 2016.

Comments: To appear in RANDOM 2018. 32 pages, 2 figures. Added sections 1.5.3 and 7.1, changed terminology, fixed typos, improved presentation, added appendix C, simplified abstract

arXiv:1610.01199 [pdf, ps, other]

Targeted Pseudorandom Generators, Simulation Advice Generators, and Derandomizing Logspace

Authors: William M. Hoza, Chris Umans

Abstract: Assume that for every derandomization result for logspace algorithms, there is a pseudorandom generator strong enough to nearly recover the derandomization by iterating over all seeds and taking a majority vote. We prove under a precise version of this assumption that $\mathbf{BPL} \subseteq \bigcap_{α> 0} \mathbf{DSPACE}(\log^{1 + α} n)$. We strengthen the theorem to an equivalence by consideri… ▽ More Assume that for every derandomization result for logspace algorithms, there is a pseudorandom generator strong enough to nearly recover the derandomization by iterating over all seeds and taking a majority vote. We prove under a precise version of this assumption that $\mathbf{BPL} \subseteq \bigcap_{α> 0} \mathbf{DSPACE}(\log^{1 + α} n)$. We strengthen the theorem to an equivalence by considering two generalizations of the concept of a pseudorandom generator against logspace. A targeted pseudorandom generator against logspace takes as input a short uniform random seed and a finite automaton; it outputs a long bitstring that looks random to that particular automaton. A simulation advice generator for logspace stretches a small uniform random seed into a long advice string; the requirement is that there is some logspace algorithm that, given a finite automaton and this advice string, simulates the automaton reading a long uniform random input. We prove that $\bigcap_{α> 0} \mathbf{promise\mbox{-}BPSPACE}(\log^{1 + α} n) = \bigcap_{α> 0} \mathbf{promise\mbox{-}DSPACE}(\log^{1 + α} n)$ if and only if for every targeted pseudorandom generator against logspace, there is a simulation advice generator for logspace with similar parameters. Finally, we observe that in a certain uniform setting (namely, if we only worry about sequences of automata that can be generated in logspace), targeted pseudorandom generators against logspace can be transformed into simulation advice generators with similar parameters. △ Less

Submitted 9 April, 2017; v1 submitted 4 October, 2016; originally announced October 2016.

Comments: 24 pages, 2 figures; added more commentary and references, fixed typos, changed notation and formatting

arXiv:1412.8097 [pdf, ps, other]

The Adversarial Noise Threshold for Distributed Protocols

Authors: William M. Hoza, Leonard J. Schulman

Abstract: We consider the problem of implementing distributed protocols, despite adversarial channel errors, on synchronous-messaging networks with arbitrary topology. In our first result we show that any $n$-party $T$-round protocol on an undirected communication network $G$ can be compiled into a robust simulation protocol on a sparse ($\mathcal{O}(n)$ edges) subnetwork so that the simulation tolerates… ▽ More We consider the problem of implementing distributed protocols, despite adversarial channel errors, on synchronous-messaging networks with arbitrary topology. In our first result we show that any $n$-party $T$-round protocol on an undirected communication network $G$ can be compiled into a robust simulation protocol on a sparse ($\mathcal{O}(n)$ edges) subnetwork so that the simulation tolerates an adversarial error rate of $Ω\left(\frac{1}{n}\right)$; the simulation has a round complexity of $\mathcal{O}\left(\frac{m \log n}{n} T\right)$, where $m$ is the number of edges in $G$. (So the simulation is work-preserving up to a $\log$ factor.) The adversary's error rate is within a constant factor of optimal. Given the error rate, the round complexity blowup is within a factor of $\mathcal{O}(k \log n)$ of optimal, where $k$ is the edge connectivity of $G$. We also determine that the maximum tolerable error rate on directed communication networks is $Θ(1/s)$ where $s$ is the number of edges in a minimum equivalent digraph. Next we investigate adversarial per-edge error rates, where the adversary is given an error budget on each edge of the network. We determine the exact limit for tolerable per-edge error rates on an arbitrary directed graph. However, the construction that approaches this limit has exponential round complexity, so we give another compiler, which transforms $T$-round protocols into $\mathcal{O}(mT)$-round simulations, and prove that for polynomial-query black box compilers, the per-edge error rate tolerated by this last compiler is within a constant factor of optimal. △ Less

Submitted 28 April, 2015; v1 submitted 27 December, 2014; originally announced December 2014.

Comments: 23 pages, 2 figures. Fixes mistake in theorem 6 and various typos

Showing 1–6 of 6 results for author: Hoza, W M