-
Low-Rank Thinning
Authors:
Annabelle Michael Carrell,
Albert Gong,
Abhishek Shetty,
Raaz Dwivedi,
Lester Mackey
Abstract:
The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from p…
▽ More
The goal in thinning is to summarize a dataset using a small set of representative points. Remarkably, sub-Gaussian thinning algorithms like Kernel Halving and Compress can match the quality of uniform subsampling while substantially reducing the number of summary points. However, existing guarantees cover only a restricted range of distributions and kernel-based quality measures and suffer from pessimistic dimension dependence. To address these deficiencies, we introduce a new low-rank analysis of sub-Gaussian thinning that applies to any distribution and any kernel, guaranteeing high-quality compression whenever the kernel or data matrix is approximately low-rank. To demonstrate the broad applicability of the techniques, we design practical sub-Gaussian thinning approaches that improve upon the best known guarantees for approximating attention in transformers, accelerating stochastic gradient training through reordering, and distinguishing distributions in near-linear time.
△ Less
Submitted 17 June, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
Optimal PAC Bounds Without Uniform Convergence
Authors:
Ishaq Aden-Ali,
Yeshwanth Cherapanamjeri,
Abhishek Shetty,
Nikita Zhivotovskiy
Abstract:
In statistical learning theory, determining the sample complexity of realizable binary classification for VC classes was a long-standing open problem. The results of Simon and Hanneke established sharp upper bounds in this setting. However, the reliance of their argument on the uniform convergence principle limits its applicability to more general learning settings such as multiclass classificatio…
▽ More
In statistical learning theory, determining the sample complexity of realizable binary classification for VC classes was a long-standing open problem. The results of Simon and Hanneke established sharp upper bounds in this setting. However, the reliance of their argument on the uniform convergence principle limits its applicability to more general learning settings such as multiclass classification. In this paper, we address this issue by providing optimal high probability risk bounds through a framework that surpasses the limitations of uniform convergence arguments.
Our framework converts the leave-one-out error of permutation invariant predictors into high probability risk bounds. As an application, by adapting the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we propose an algorithm that achieves an optimal PAC bound for binary classification. Specifically, our result shows that certain aggregations of one-inclusion graph algorithms are optimal, addressing a variant of a classic question posed by Warmuth.
We further instantiate our framework in three settings where uniform convergence is provably suboptimal. For multiclass classification, we prove an optimal risk bound that scales with the one-inclusion hypergraph density of the class, addressing the suboptimality of the analysis of Daniely and Shalev-Shwartz. For partial hypothesis classification, we determine the optimal sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman, and Moran. For realizable bounded regression with absolute loss, we derive an optimal risk bound that relies on a modified version of the scale-sensitive dimension, refining the results of Bartlett and Long. Our rates surpass standard uniform convergence-based results due to the smaller complexity measure in our risk bound.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
The One-Inclusion Graph Algorithm is not Always Optimal
Authors:
Ishaq Aden-Ali,
Yeshwanth Cherapanamjeri,
Abhishek Shetty,
Nikita Zhivotovskiy
Abstract:
The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest s…
▽ More
The one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth achieves an optimal in-expectation risk bound in the standard PAC classification setup. In one of the first COLT open problems, Warmuth conjectured that this prediction strategy always implies an optimal high probability bound on the risk, and hence is also an optimal PAC algorithm. We refute this conjecture in the strongest sense: for any practically interesting Vapnik-Chervonenkis class, we provide an in-expectation optimal one-inclusion graph algorithm whose high probability risk bound cannot go beyond that implied by Markov's inequality. Our construction of these poorly performing one-inclusion graph algorithms uses Varshamov-Tenengolts error correcting codes.
Our negative result has several implications. First, it shows that the same poor high-probability performance is inherited by several recent prediction strategies based on generalizations of the one-inclusion graph algorithm. Second, our analysis shows yet another statistical problem that enjoys an estimator that is provably optimal in expectation via a leave-one-out argument, but fails in the high-probability regime. This discrepancy occurs despite the boundedness of the binary loss for which arguments based on concentration inequalities often provide sharp high probability risk bounds.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Distribution Compression in Near-linear Time
Authors:
Abhishek Shetty,
Raaz Dwivedi,
Lester Mackey
Abstract:
In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadrat…
▽ More
In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of $4$ in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $\mathcal{O}(\sqrt{\log n/n})$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $\mathcal{O}(n \log^3 n)$ time and $\mathcal{O}( \sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.
△ Less
Submitted 17 October, 2022; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Matrix Discrepancy from Quantum Communication
Authors:
Samuel B. Hopkins,
Prasad Raghavendra,
Abhishek Shetty
Abstract:
We develop a novel connection between discrepancy minimization and (quantum) communication complexity. As an application, we resolve a substantial special case of the Matrix Spencer conjecture. In particular, we show that for every collection of symmetric $n \times n$ matrices $A_1,\ldots,A_n$ with $\|A_i\| \leq 1$ and $\|A_i\|_F \leq n^{1/4}$ there exist signs $x \in \{ \pm 1\}^n$ such that the m…
▽ More
We develop a novel connection between discrepancy minimization and (quantum) communication complexity. As an application, we resolve a substantial special case of the Matrix Spencer conjecture. In particular, we show that for every collection of symmetric $n \times n$ matrices $A_1,\ldots,A_n$ with $\|A_i\| \leq 1$ and $\|A_i\|_F \leq n^{1/4}$ there exist signs $x \in \{ \pm 1\}^n$ such that the maximum eigenvalue of $\sum_{i \leq n} x_i A_i$ is at most $O(\sqrt n)$. We give a polynomial-time algorithm based on partial coloring and semidefinite programming to find such $x$.
Our techniques open a new avenue to use tools from communication complexity and information theory to study discrepancy. The proof of our main result combines a simple compression scheme for transcripts of repeated (quantum) communication protocols with quantum state purification, the Holevo bound from quantum information, and tools from sketching and dimensionality reduction. Our approach also offers a promising avenue to resolve the Matrix Spencer conjecture completely -- we show it is implied by a natural conjecture in quantum communication complexity.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Learning Robustness with Bounded Failure: An Iterative MPC Approach
Authors:
Monimoy Bujarbaruah,
Akhil Shetty,
Kameshwar Poolla,
Francesco Borrelli
Abstract:
We propose an approach to design a Model Predictive Controller (MPC) for constrained Linear Time Invariant systems performing an iterative task. The system is subject to an additive disturbance, and the goal is to learn to satisfy state and input constraints robustly. Using disturbance measurements after each iteration, we construct Confidence Support sets, which contain the true support of the di…
▽ More
We propose an approach to design a Model Predictive Controller (MPC) for constrained Linear Time Invariant systems performing an iterative task. The system is subject to an additive disturbance, and the goal is to learn to satisfy state and input constraints robustly. Using disturbance measurements after each iteration, we construct Confidence Support sets, which contain the true support of the disturbance distribution with a given probability. As more data is collected, the Confidence Supports converge to the true support of the disturbance. This enables design of an MPC controller that avoids conservative estimate of the disturbance support, while simultaneously bounding the probability of constraint violation. The efficacy of the proposed approach is then demonstrated with a detailed numerical example.
△ Less
Submitted 10 June, 2023; v1 submitted 22 November, 2019;
originally announced November 2019.
-
Sampling and Optimization on Convex Sets in Riemannian Manifolds of Non-Negative Curvature
Authors:
Navin Goyal,
Abhishek Shetty
Abstract:
The Euclidean space notion of convex sets (and functions) generalizes to Riemannian manifolds in a natural sense and is called geodesic convexity. Extensively studied computational problems such as convex optimization and sampling in convex sets also have meaningful counterparts in the manifold setting. Geodesically convex optimization is a well-studied problem with ongoing research and considerab…
▽ More
The Euclidean space notion of convex sets (and functions) generalizes to Riemannian manifolds in a natural sense and is called geodesic convexity. Extensively studied computational problems such as convex optimization and sampling in convex sets also have meaningful counterparts in the manifold setting. Geodesically convex optimization is a well-studied problem with ongoing research and considerable recent interest in machine learning and theoretical computer science. In this paper, we study sampling and convex optimization problems over manifolds of non-negative curvature proving polynomial running time in the dimension and other relevant parameters. Our algorithms assume a warm start. We first present a random walk based sampling algorithm and then combine it with simulated annealing for solving convex optimization problems. To our knowledge, these are the first algorithms in the general setting of positively curved manifolds with provable polynomial guarantees under reasonable assumptions, and the first study of the connection between sampling and optimization in this setting.
△ Less
Submitted 24 July, 2019;
originally announced July 2019.
-
Optimal Resource Procurement and the Price of Causality
Authors:
Sen Li,
Akhil Shetty,
Kameshwar Poolla,
Pravin Varaiya
Abstract:
This paper studies the problem of procuring diverse resources in a forward market to cover a set $\bf{E}$ of uncertain demand signals $\bf{e}$. We consider two scenarios: (a) $\bf{e}$ is revealed all at once by an oracle (b) $\bf{e}$ reveals itself causally. Each scenario induces an optimal procurement cost. The ratio between these two costs is defined as the {\em price of causality}. It captures…
▽ More
This paper studies the problem of procuring diverse resources in a forward market to cover a set $\bf{E}$ of uncertain demand signals $\bf{e}$. We consider two scenarios: (a) $\bf{e}$ is revealed all at once by an oracle (b) $\bf{e}$ reveals itself causally. Each scenario induces an optimal procurement cost. The ratio between these two costs is defined as the {\em price of causality}. It captures the additional cost of not knowing the future values of the uncertain demand signal. We consider two application contexts: procuring energy reserves from a forward capacity market, and purchasing virtual machine instances from a cloud service. An upper bound on the price of causality is obtained, and the exact price of causality is computed for some special cases. The algorithmic basis for all these computations is set containment linear programming. A mechanism is proposed to allocate the procurement cost to consumers who in aggregate produce the demand signal. We show that the proposed cost allocation is fair, budget-balanced, and respects the cost-causation principle. The results are validated through numerical simulations.
△ Less
Submitted 7 June, 2019;
originally announced June 2019.