-
Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-index Models
Authors:
Ilias Diakonikolas,
Giannis Iakovidis,
Daniel M. Kane,
Lisheng Ren
Abstract:
We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreove…
▽ More
We study the complexity of learning real-valued Multi-Index Models (MIMs) under the Gaussian distribution. A $K$-MIM is a function $f:\mathbb{R}^d\to \mathbb{R}$ that depends only on the projection of its input onto a $K$-dimensional subspace. We give a general algorithm for PAC learning a broad class of MIMs with respect to the square loss, even in the presence of adversarial label noise. Moreover, we establish a nearly matching Statistical Query (SQ) lower bound, providing evidence that the complexity of our algorithm is qualitatively optimal as a function of the dimension. Specifically, we consider the class of bounded variation MIMs with the property that degree at most $m$ distinguishing moments exist with respect to projections onto any subspace. In the presence of adversarial label noise, the complexity of our learning algorithm is $d^{O(m)}2^{\mathrm{poly}(K/ε)}$. For the realizable and independent noise settings, our algorithm incurs complexity $d^{O(m)}2^{\mathrm{poly}(K)}(1/ε)^{O(K)}$. To complement our upper bound, we show that if for some subspace degree-$m$ distinguishing moments do not exist, then any SQ learner for the corresponding class of MIMs requires complexity $d^{Ω(m)}$. As an application, we give the first efficient learner for the class of positive-homogeneous $L$-Lipschitz $K$-MIMs. The resulting algorithm has complexity $\mathrm{poly}(d) 2^{\mathrm{poly}(KL/ε)}$. This gives a new PAC learning algorithm for Lipschitz homogeneous ReLU networks with complexity independent of the network size, removing the exponential dependence incurred in prior work.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
On Learning Parallel Pancakes with Mostly Uniform Weights
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sushrut Karmalkar,
Jasper C. H. Lee,
Thanasis Pittas
Abstract:
We study the complexity of learning $k$-mixtures of Gaussians ($k$-GMMs) on $\mathbb{R}^d$. This task is known to have complexity $d^{Ω(k)}$ in full generality. To circumvent this exponential lower bound on the number of components, research has focused on learning families of GMMs satisfying additional structural properties. A natural assumption posits that the component weights are not exponenti…
▽ More
We study the complexity of learning $k$-mixtures of Gaussians ($k$-GMMs) on $\mathbb{R}^d$. This task is known to have complexity $d^{Ω(k)}$ in full generality. To circumvent this exponential lower bound on the number of components, research has focused on learning families of GMMs satisfying additional structural properties. A natural assumption posits that the component weights are not exponentially small and that the components have the same unknown covariance. Recent work gave a $d^{O(\log(1/w_{\min}))}$-time algorithm for this class of GMMs, where $w_{\min}$ is the minimum weight. Our first main result is a Statistical Query (SQ) lower bound showing that this quasi-polynomial upper bound is essentially best possible, even for the special case of uniform weights. Specifically, we show that it is SQ-hard to distinguish between such a mixture and the standard Gaussian. We further explore how the distribution of weights affects the complexity of this task. Our second main result is a quasi-polynomial upper bound for the aforementioned testing task when most of the weights are uniform while a small fraction of the weights are potentially arbitrary.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Faster Algorithms for Agnostically Learning Disjunctions and their Implications
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren
Abstract:
We study the algorithmic task of learning Boolean disjunctions in the distribution-free agnostic PAC model. The best known agnostic learner for the class of disjunctions over $\{0, 1\}^n$ is the $L_1$-polynomial regression algorithm, achieving complexity $2^{\tilde{O}(n^{1/2})}$. This complexity bound is known to be nearly best possible within the class of Correlational Statistical Query (CSQ) alg…
▽ More
We study the algorithmic task of learning Boolean disjunctions in the distribution-free agnostic PAC model. The best known agnostic learner for the class of disjunctions over $\{0, 1\}^n$ is the $L_1$-polynomial regression algorithm, achieving complexity $2^{\tilde{O}(n^{1/2})}$. This complexity bound is known to be nearly best possible within the class of Correlational Statistical Query (CSQ) algorithms. In this work, we develop an agnostic learner for this concept class with complexity $2^{\tilde{O}(n^{1/3})}$. Our algorithm can be implemented in the Statistical Query (SQ) model, providing the first separation between the SQ and CSQ models in distribution-free agnostic learning.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Batch List-Decodable Linear Regression via Higher Moments
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sushrut Karmalkar,
Sihan Liu,
Thanasis Pittas
Abstract:
We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $α\in (0, 1/2)$, an unknown $α$-fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regr…
▽ More
We study the task of list-decodable linear regression using batches. A batch is called clean if it consists of i.i.d. samples from an unknown linear regression distribution. For a parameter $α\in (0, 1/2)$, an unknown $α$-fraction of the batches are clean and no assumptions are made on the remaining ones. The goal is to output a small list of vectors at least one of which is close to the true regressor vector in $\ell_2$-norm. [DJKS23] gave an efficient algorithm, under natural distributional assumptions, with the following guarantee. Assuming that the batch size $n$ satisfies $n \geq \tildeΩ(α^{-1})$ and the number of batches is $m = \mathrm{poly}(d, n, 1/α)$, their algorithm runs in polynomial time and outputs a list of $O(1/α^2)$ vectors at least one of which is $\tilde{O}(α^{-1/2}/\sqrt{n})$ close to the target regressor. Here we design a new polynomial time algorithm with significantly stronger guarantees under the assumption that the low-degree moments of the covariates distribution are Sum-of-Squares (SoS) certifiably bounded. Specifically, for any constant $δ>0$, as long as the batch size is $n \geq Ω_δ(α^{-δ})$ and the degree-$Θ(1/δ)$ moments of the covariates are SoS certifiably bounded, our algorithm uses $m = \mathrm{poly}((dn)^{1/δ}, 1/α)$ batches, runs in polynomial-time, and outputs an $O(1/α)$-sized list of vectors one of which is $O(α^{-δ/2}/\sqrt{n})$ close to the target. That is, our algorithm achieves substantially smaller minimum batch size and final error, while achieving the optimal list size. Our approach uses higher-order moment information by carefully combining the SoS paradigm interleaved with an iterative method and a novel list pruning procedure. In the process, we give an SoS proof of the Marcinkiewicz-Zygmund inequality that may be of broader applicability.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Efficient Multivariate Robust Mean Estimation Under Mean-Shift Contamination
Authors:
Ilias Diakonikolas,
Giannis Iakovidis,
Daniel M. Kane,
Thanasis Pittas
Abstract:
We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $α<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-α$, $x_i$ is drawn from $\mathcal{N}(μ, I)$, whe…
▽ More
We study the algorithmic problem of robust mean estimation of an identity covariance Gaussian in the presence of mean-shift contamination. In this contamination model, we are given a set of points in $\mathbb{R}^d$ generated i.i.d. via the following process. For a parameter $α<1/2$, the $i$-th sample $x_i$ is obtained as follows: with probability $1-α$, $x_i$ is drawn from $\mathcal{N}(μ, I)$, where $μ\in \mathbb{R}^d$ is the target mean; and with probability $α$, $x_i$ is drawn from $\mathcal{N}(z_i, I)$, where $z_i$ is unknown and potentially arbitrary. Prior work characterized the information-theoretic limits of this task. Specifically, it was shown that, in contrast to Huber contamination, in the presence of mean-shift contamination consistent estimation is possible. On the other hand, all known robust estimators in the mean-shift model have running times exponential in the dimension. Here we give the first computationally efficient algorithm for high-dimensional robust mean estimation with mean-shift contamination that can tolerate a constant fraction of outliers. In particular, our algorithm has near-optimal sample complexity, runs in sample-polynomial time, and approximates the target mean to any desired accuracy. Conceptually, our result contributes to a growing body of work that studies inference with respect to natural noise models lying in between fully adversarial and random settings.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Robust Learning of Multi-index Models via Iterative Subspace Approximation
Authors:
Ilias Diakonikolas,
Giannis Iakovidis,
Daniel M. Kane,
Nikos Zarifis
Abstract:
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. A $K$-MIM is any function $f$ that only depends on a $K$-dimensional subspace. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. Our main contribution is a general robust learner that is qualitatively optimal in the Statistical Query (SQ) model. Our a…
▽ More
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. A $K$-MIM is any function $f$ that only depends on a $K$-dimensional subspace. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. Our main contribution is a general robust learner that is qualitatively optimal in the Statistical Query (SQ) model. Our algorithm iteratively constructs better approximations to the defining subspace by computing low-degree moments conditional on the projection to the subspace computed thus far, and adding directions with relatively large empirical moments. This procedure efficiently finds a subspace $V$ so that $f(\mathbf{x})$ is close to a function of the projection of $\mathbf{x}$ onto $V$. Conversely, for functions for which these conditional moments do not help, we prove an SQ lower bound suggesting that no efficient learner exists. As applications, we provide faster robust learners for the following concept classes:
* {\bf Multiclass Linear Classifiers} We give a constant-factor approximate agnostic learner with sample complexity $N = O(d) 2^{\mathrm{poly}(K/ε)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first constant-factor agnostic learner for this class whose complexity is a fixed-degree polynomial in $d$.
* {\bf Intersections of Halfspaces} We give an approximate agnostic learner for this class achieving 0-1 error $K \tilde{O}(\mathrm{OPT}) + ε$ with sample complexity $N=O(d^2) 2^{\mathrm{poly}(K/ε)}$ and computational complexity $\mathrm{poly}(N ,d)$. This is the first agnostic learner for this class with near-linear error dependence and complexity a fixed-degree polynomial in $d$.
Furthermore, we show that in the presence of random classification noise, the complexity of our algorithm scales polynomially with $1/ε$.
△ Less
Submitted 14 April, 2025; v1 submitted 13 February, 2025;
originally announced February 2025.
-
Entangled Mean Estimation in High-Dimensions
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sihan Liu,
Thanasis Pittas
Abstract:
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. Specifically, given $N$ independent random points $x_1,\ldots,x_N$ in $\mathbb{R}^D$ and a parameter $α\in (0, 1)$ such that each $x_i$ is drawn from a Gaussian with mean $μ$ and unknown covariance, and an unknown $α$-fraction of the points have identity-bounded covariances, the goal is to estimate the…
▽ More
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. Specifically, given $N$ independent random points $x_1,\ldots,x_N$ in $\mathbb{R}^D$ and a parameter $α\in (0, 1)$ such that each $x_i$ is drawn from a Gaussian with mean $μ$ and unknown covariance, and an unknown $α$-fraction of the points have identity-bounded covariances, the goal is to estimate the common mean $μ$. The one-dimensional version of this task has received significant attention in theoretical computer science and statistics over the past decades. Recent work [LY20; CV24] has given near-optimal upper and lower bounds for the one-dimensional setting. On the other hand, our understanding of even the information-theoretic aspects of the multivariate setting has remained limited.
In this work, we design a computationally efficient algorithm achieving an information-theoretically near-optimal error. Specifically, we show that the optimal error (up to polylogarithmic factors) is $f(α,N) + \sqrt{D/(αN)}$, where the term $f(α,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate. Our algorithmic approach employs an iterative refinement strategy, whereby we progressively learn more accurate approximations $\hat μ$ to $μ$. This is achieved via a novel rejection sampling procedure that removes points significantly deviating from $\hat μ$, as an attempt to filter out unusually noisy samples. A complication that arises is that rejection sampling introduces bias in the distribution of the remaining points. To address this issue, we perform a careful analysis of the bias, develop an iterative dimension-reduction strategy, and employ a novel subroutine inspired by list-decodable learning that leverages the one-dimensional result.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Active Learning of General Halfspaces: Label Queries vs Membership Queries
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Mingchen Ma
Abstract:
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces under the Gaussian distribution on $R^d$ in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm is allowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivial imp…
▽ More
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces under the Gaussian distribution on $R^d$ in the presence of some form of query access. In the classical pool-based active learning model, where the algorithm is allowed to make adaptive label queries to previously sampled points, we establish a strong information-theoretic lower bound ruling out non-trivial improvements over the passive setting. Specifically, we show that any active learner requires label complexity of $\tildeΩ(d/(\log(m)ε))$, where $m$ is the number of unlabeled examples. Specifically, to beat the passive label complexity of $\tilde{O} (d/ε)$, an active learner requires a pool of $2^{poly(d)}$ unlabeled samples. On the positive side, we show that this lower bound can be circumvented with membership query access, even in the agnostic model. Specifically, we give a computationally efficient learner with query complexity of $\tilde{O}(\min\{1/p, 1/ε\} + d\cdot polylog(1/ε))$ achieving error guarantee of $O(opt)+ε$. Here $p \in [0, 1/2]$ is the bias and $opt$ is the 0-1 loss of the optimal halfspace. As a corollary, we obtain a strong separation between the active and membership query models. Taken together, our results characterize the complexity of learning general halfspaces under Gaussian marginals in these models.
△ Less
Submitted 31 December, 2024;
originally announced January 2025.
-
Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models
Authors:
Ilias Diakonikolas,
Daniel M. Kane
Abstract:
We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor…
▽ More
We study the task of learning latent-variable models. A common algorithmic technique for this task is the method of moments. Unfortunately, moment-based approaches are hampered by the fact that the moment tensors of super-constant degree cannot even be written down in polynomial time. Motivated by such learning applications, we develop a general efficient algorithm for {\em implicit moment tensor computation}. Our framework generalizes the work of~\cite{LL21-opt} which developed an efficient algorithm for the specific moment tensors that arise in clustering mixtures of spherical Gaussians.
By leveraging our implicit moment estimation algorithm, we obtain the first $\mathrm{poly}(d, k)$-time learning algorithms for the following models.
* {\bf Mixtures of Linear Regressions} We give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm for this task, where $ε$ is the desired error.
* {\bf Mixtures of Spherical Gaussians} For density estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time learning algorithm, where $ε$ is the desired total variation error, under the condition that the means lie in a ball of radius $O(\sqrt{\log k})$. For parameter estimation, we give a $\mathrm{poly}(d, k, 1/ε)$-time algorithm under the {\em optimal} mean separation of $Ω(\log^{1/2}(k/ε))$.
* {\bf Positive Linear Combinations of Non-Linear Activations} We give a general algorithm for this task with complexity $\mathrm{poly}(d, k) g(ε)$, where $ε$ is the desired error and the function $g$ depends on the Hermite concentration of the target class of functions. Specifically, for positive linear combinations of ReLU activations, our algorithm has complexity $\mathrm{poly}(d, k) 2^{\mathrm{poly}(1/ε)}$.
△ Less
Submitted 12 April, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.
-
Locally Sampleable Uniform Symmetric Distributions
Authors:
Daniel M. Kane,
Anthony Ostuni,
Kewen Wu
Abstract:
We characterize the power of constant-depth Boolean circuits in generating uniform symmetric distributions. Let $f\colon\{0,1\}^m\to\{0,1\}^n$ be a Boolean function where each output bit of $f$ depends only on $O(1)$ input bits. Assume the output distribution of $f$ on uniform input bits is close to a uniform distribution $D$ with a symmetric support. We show that $D$ is essentially one of the fol…
▽ More
We characterize the power of constant-depth Boolean circuits in generating uniform symmetric distributions. Let $f\colon\{0,1\}^m\to\{0,1\}^n$ be a Boolean function where each output bit of $f$ depends only on $O(1)$ input bits. Assume the output distribution of $f$ on uniform input bits is close to a uniform distribution $D$ with a symmetric support. We show that $D$ is essentially one of the following six possibilities: (1) point distribution on $0^n$, (2) point distribution on $1^n$, (3) uniform over $\{0^n,1^n\}$, (4) uniform over strings with even Hamming weights, (5) uniform over strings with odd Hamming weights, and (6) uniform over all strings. This confirms a conjecture of Filmus, Leigh, Riazanov, and Sokolov (RANDOM 2023).
△ Less
Submitted 25 February, 2025; v1 submitted 12 November, 2024;
originally announced November 2024.
-
Efficient Testable Learning of General Halfspaces with Adversarial Label Noise
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sihan Liu,
Nikos Zarifis
Abstract:
We study the task of testable learning of general -- not necessarily homogeneous -- halfspaces with adversarial label noise with respect to the Gaussian distribution. In the testable learning framework, the goal is to develop a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data.Our main result is the first polynomial time tester-…
▽ More
We study the task of testable learning of general -- not necessarily homogeneous -- halfspaces with adversarial label noise with respect to the Gaussian distribution. In the testable learning framework, the goal is to develop a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data.Our main result is the first polynomial time tester-learner for general halfspaces that achieves dimension-independent misclassification error. At the heart of our approach is a new methodology to reduce testable learning of general halfspaces to testable learning of nearly homogeneous halfspaces that may be of broader interest.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Replicability in High Dimensional Statistics
Authors:
Max Hopkins,
Russell Impagliazzo,
Daniel Kane,
Sihan Liu,
Christopher Ye
Abstract:
The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the comput…
▽ More
The replicability crisis is a major issue across nearly all areas of empirical science, calling for the formal study of replicability in statistics. Motivated in this context, [Impagliazzo, Lei, Pitassi, and Sorrell STOC 2022] introduced the notion of replicable learning algorithms, and gave basic procedures for $1$-dimensional tasks including statistical queries. In this work, we study the computational and statistical cost of replicability for several fundamental high dimensional statistical tasks, including multi-hypothesis testing and mean estimation.
Our main contribution establishes a computational and statistical equivalence between optimal replicable algorithms and high dimensional isoperimetric tilings. As a consequence, we obtain matching sample complexity upper and lower bounds for replicable mean estimation of distributions with bounded covariance, resolving an open problem of [Bun, Gaboardi, Hopkins, Impagliazzo, Lei, Pitassi, Sivakumar, and Sorrell, STOC2023] and for the $N$-Coin Problem, resolving a problem of [Karbasi, Velegkas, Yang, and Zhou, NeurIPS2023] up to log factors.
While our equivalence is computational, allowing us to shave log factors in sample complexity from the best known efficient algorithms, efficient isoperimetric tilings are not known. To circumvent this, we introduce several relaxed paradigms that do allow for sample and computationally efficient algorithms, including allowing pre-processing, adaptivity, and approximate replicability. In these cases we give efficient algorithms matching or beating the best known sample complexity for mean estimation and the coin problem, including a generic procedure that reduces the standard quadratic overhead of replicability to linear in expectation.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Super Non-singular Decompositions of Polynomials and their Application to Robustly Learning Low-degree PTFs
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Vasilis Kontonis,
Sihan Liu,
Nikos Zarifis
Abstract:
We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee $O_{d, c}(\text{opt}^{1-c})$, for any desired constant $c>0$, w…
▽ More
We study the efficient learnability of low-degree polynomial threshold functions (PTFs) in the presence of a constant fraction of adversarial corruptions. Our main algorithmic result is a polynomial-time PAC learning algorithm for this concept class in the strong contamination model under the Gaussian distribution with error guarantee $O_{d, c}(\text{opt}^{1-c})$, for any desired constant $c>0$, where $\text{opt}$ is the fraction of corruptions. In the strong contamination model, an omniscient adversary can arbitrarily corrupt an $\text{opt}$-fraction of the data points and their labels. This model generalizes the malicious noise model and the adversarial label noise model. Prior to our work, known polynomial-time algorithms in this corruption model (or even in the weaker adversarial label noise model) achieved error $\tilde{O}_d(\text{opt}^{1/(d+1)})$, which deteriorates significantly as a function of the degree $d$.
Our algorithm employs an iterative approach inspired by localization techniques previously used in the context of learning linear threshold functions. Specifically, we use a robust perceptron algorithm to compute a good partial classifier and then iterate on the unclassified points. In order to achieve this, we need to take a set defined by a number of polynomial inequalities and partition it into several well-behaved subsets. To this end, we develop new polynomial decomposition techniques that may be of independent interest.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sushrut Karmalkar,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study Gaussian sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian r…
▽ More
We study Gaussian sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian robust $k$-sparse mean estimation on $\mathbb{R}^d$ with corruption rate $ε>0$, our algorithm has sample complexity $(k^2/ε^2)\mathrm{polylog}(d/ε)$, runs in sample polynomial time, and approximates the target mean within $\ell_2$-error $O(ε)$. Previous efficient algorithms inherently incur error $Ω(ε\sqrt{\log(1/ε)})$. At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
SQ Lower Bounds for Non-Gaussian Component Analysis with Weaker Assumptions
Authors:
Ilias Diakonikolas,
Daniel Kane,
Lisheng Ren,
Yuxin Sun
Abstract:
We study the complexity of Non-Gaussian Component Analysis (NGCA) in the Statistical Query (SQ) model. Prior work developed a general methodology to prove SQ lower bounds for this task that have been applicable to a wide range of contexts. In particular, it was known that for any univariate distribution $A$ satisfying certain conditions, distinguishing between a standard multivariate Gaussian and…
▽ More
We study the complexity of Non-Gaussian Component Analysis (NGCA) in the Statistical Query (SQ) model. Prior work developed a general methodology to prove SQ lower bounds for this task that have been applicable to a wide range of contexts. In particular, it was known that for any univariate distribution $A$ satisfying certain conditions, distinguishing between a standard multivariate Gaussian and a distribution that behaves like $A$ in a random hidden direction and like a standard Gaussian in the orthogonal complement, is SQ-hard. The required conditions were that (1) $A$ matches many low-order moments with the standard univariate Gaussian, and (2) the chi-squared norm of $A$ with respect to the standard Gaussian is finite. While the moment-matching condition is necessary for hardness, the chi-squared condition was only required for technical reasons. In this work, we establish that the latter condition is indeed not necessary. In particular, we prove near-optimal SQ lower bounds for NGCA under the moment-matching condition only. Our result naturally generalizes to the setting of a hidden subspace. Leveraging our general SQ lower bound, we obtain near-optimal SQ lower bounds for a range of concrete estimation tasks where existing techniques provide sub-optimal or even vacuous guarantees.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Statistical Query Lower Bounds for Learning Truncated Gaussians
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Thanasis Pittas,
Nikos Zarifis
Abstract:
We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution $\mathcal{N}(\boldsymbol{ μ}, \mathbf{ I})$ truncated to the set…
▽ More
We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution $\mathcal{N}(\boldsymbol{ μ}, \mathbf{ I})$ truncated to the set $S$. The goal is to estimate $\boldsymbolμ$ within accuracy $ε>0$ in $\ell_2$-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is $d^{\mathrm{poly}(1/ε)}$, even when the class $\mathcal{C}$ is simple so that $\mathrm{poly}(d/ε)$ samples information-theoretically suffice. Concretely, our SQ lower bound applies when $\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Locality Bounds for Sampling Hamming Slices
Authors:
Daniel M. Kane,
Anthony Ostuni,
Kewen Wu
Abstract:
Spurred by the influential work of Viola (Journal of Computing 2012), the past decade has witnessed an active line of research into the complexity of (approximately) sampling distributions, in contrast to the traditional focus on the complexity of computing functions.
We build upon and make explicit earlier implicit results of Viola to provide superconstant lower bounds on the locality of Boolea…
▽ More
Spurred by the influential work of Viola (Journal of Computing 2012), the past decade has witnessed an active line of research into the complexity of (approximately) sampling distributions, in contrast to the traditional focus on the complexity of computing functions.
We build upon and make explicit earlier implicit results of Viola to provide superconstant lower bounds on the locality of Boolean functions approximately sampling the uniform distribution over binary strings of particular Hamming weights, both exactly and modulo an integer, answering questions of Viola (Journal of Computing 2012) and Filmus, Leigh, Riazanov, and Sokolov (RANDOM 2023). Applications to data structure lower bounds and quantum-classical separations are discussed.
△ Less
Submitted 26 February, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Agnostically Learning Multi-index Models with Queries
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Vasilis Kontonis,
Christos Tzamos,
Nikos Zarifis
Abstract:
We study the power of query access for the task of agnostic learning under the Gaussian distribution. In the agnostic model, no assumptions are made on the labels and the goal is to compute a hypothesis that is competitive with the {\em best-fit} function in a known class, i.e., it achieves error $\mathrm{opt}+ε$, where $\mathrm{opt}$ is the error of the best function in the class. We focus on a g…
▽ More
We study the power of query access for the task of agnostic learning under the Gaussian distribution. In the agnostic model, no assumptions are made on the labels and the goal is to compute a hypothesis that is competitive with the {\em best-fit} function in a known class, i.e., it achieves error $\mathrm{opt}+ε$, where $\mathrm{opt}$ is the error of the best function in the class. We focus on a general family of Multi-Index Models (MIMs), which are $d$-variate functions that depend only on few relevant directions, i.e., have the form $g(\mathbf{W} \mathbf{x})$ for an unknown link function $g$ and a $k \times d$ matrix $\mathbf{W}$. Multi-index models cover a wide range of commonly studied function classes, including constant-depth neural networks with ReLU activations, and intersections of halfspaces.
Our main result shows that query access gives significant runtime improvements over random examples for agnostically learning MIMs. Under standard regularity assumptions for the link function (namely, bounded variation or surface area), we give an agnostic query learner for MIMs with complexity $O(k)^{\mathrm{poly}(1/ε)} \; \mathrm{poly}(d) $. In contrast, algorithms that rely only on random examples inherently require $d^{\mathrm{poly}(1/ε)}$ samples and runtime, even for the basic problem of agnostically learning a single ReLU or a halfspace.
Our algorithmic result establishes a strong computational separation between the agnostic PAC and the agnostic PAC+Query models under the Gaussian distribution. Prior to our work, no such separation was known -- even for the special case of agnostically learning a single halfspace, for which it was an open problem first posed by Feldman. Our results are enabled by a general dimension-reduction technique that leverages query access to estimate gradients of (a smoothed version of) the underlying label function.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Chaos spectrum -- semiconductor laser with delayed optical feedback
Authors:
D M Kane,
M Radziunas
Abstract:
Maximizing the rf bandwidth associated with the chaotic output from tailored operation of nonlinear semiconductor laser systems is an ongoing research effort. The early pioneering research was done in semiconductor laser with delayed optical feedback systems, which continue to be researched. We report numerical simulations of this system, using a travelling wave model. The results provide new insi…
▽ More
Maximizing the rf bandwidth associated with the chaotic output from tailored operation of nonlinear semiconductor laser systems is an ongoing research effort. The early pioneering research was done in semiconductor laser with delayed optical feedback systems, which continue to be researched. We report numerical simulations of this system, using a travelling wave model. The results provide new insights into the impact of key device parameters affecting the chaos bandwidth and spectrum. Linewidth enhancement factor and the nonlinear gain saturation parameter are found to be the most important parameters when seeking to optimize the chaotic output. We reassess a standard definition being used to report chaos bandwidth. We propose that more spectral information should be reported if numerical and experimental research results are to be of most value into the future. A database from previous experimental study is also analyzed to connect with the predictions of the numerical simulations. This elucidates the links between the chaos bandwidth achieved in real systems and the semiconductor laser parameters. The results inform recommendations for semiconductor laser parameters that will better support broadband chaos generation in whatever semiconductor-gain-medium-based nonlinear system approach is being used. They elucidate the physics of both the envelope and the fine structure of the rf spectrum of coherence collapse.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Thanasis Pittas
Abstract:
We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge α$ for some known parameter $α$, and each $P_i$ has unknown covariance $Σ_i \preceq σ^2_i \cdot I_d$ for some unknown $σ_i$, the goal is to cluster the sam…
▽ More
We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge α$ for some known parameter $α$, and each $P_i$ has unknown covariance $Σ_i \preceq σ^2_i \cdot I_d$ for some unknown $σ_i$, the goal is to cluster the samples assuming a pairwise mean separation in the order of $(σ_i+σ_j)/\sqrtα$ between every pair of components $P_i$ and $P_j$. Our contributions are as follows:
For the special case of nearly uniform mixtures, we give the first poly-time algorithm for this clustering task. Prior work either required separation scaling with the maximum cluster standard deviation (i.e. $\max_i σ_i$) [DKK+22b] or required both additional structural assumptions and mean separation scaling as a large degree polynomial in $1/α$ [BKK22].
For general-weight mixtures, we point out that accurate clustering is information-theoretically impossible under our fine-grained mean separation assumptions. We introduce the notion of a clustering refinement -- a list of not-too-small subsets satisfying a similar separation, and which can be merged into a clustering approximating the ground truth -- and show that it is possible to efficiently compute an accurate clustering refinement of the samples. Furthermore, under a variant of the "no large sub-cluster'' condition from in prior work [BKK22], we show that our algorithm outputs an accurate clustering, not just a refinement, even for general-weight mixtures. As a corollary, we obtain efficient clustering algorithms for mixtures of well-conditioned high-dimensional log-concave distributions.
Moreover, our algorithm is robust to $Ω(α)$-fraction of adversarial outliers.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time algorithms with optimal error guarantees for both of these problems. Specifically, for Gaussian robust mean estimation on $\mathbb{R}^d$ with contamination par…
▽ More
We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time algorithms with optimal error guarantees for both of these problems. Specifically, for Gaussian robust mean estimation on $\mathbb{R}^d$ with contamination parameter $ε\in (0, ε_0)$ for a small absolute constant $ε_0$, we give an algorithm with sample complexity $n = \tilde{O}(d/ε^2)$ and almost linear runtime that approximates the target mean within $\ell_2$-error $O(ε)$. This improves on prior work that achieved this error guarantee with polynomially suboptimal sample and time complexity. For robust linear regression, we give the first algorithm with sample complexity $n = \tilde{O}(d/ε^2)$ and almost linear runtime that approximates the target regressor within $\ell_2$-error $O(ε)$. This is the first polynomial sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature. At the technical level, we develop a methodology that yields almost-linear time algorithms for multi-directional filtering that may be of broader interest.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Testing Closeness of Multivariate Distributions via Ramsey Theory
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sihan Liu
Abstract:
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions $\mathbf p, \mathbf q$ on $\mathbb R^d$, we want to distinguish between the case that $\mathbf p=\mathbf q$ versus $\|\mathbf p-\mathbf q\|_{A_k} > ε$, where $\|\mathbf p-\mathbf q\|_{A_k}$ denotes the generalized ${A}_k$ dista…
▽ More
We investigate the statistical task of closeness (or equivalence) testing for multidimensional distributions. Specifically, given sample access to two unknown distributions $\mathbf p, \mathbf q$ on $\mathbb R^d$, we want to distinguish between the case that $\mathbf p=\mathbf q$ versus $\|\mathbf p-\mathbf q\|_{A_k} > ε$, where $\|\mathbf p-\mathbf q\|_{A_k}$ denotes the generalized ${A}_k$ distance between $\mathbf p$ and $\mathbf q$ -- measuring the maximum discrepancy between the distributions over any collection of $k$ disjoint, axis-aligned rectangles. Our main result is the first closeness tester for this problem with {\em sub-learning} sample complexity in any fixed dimension and a nearly-matching sample complexity lower bound.
In more detail, we provide a computationally efficient closeness tester with sample complexity $O\left((k^{6/7}/ \mathrm{poly}_d(ε)) \log^d(k)\right)$. On the lower bound side, we establish a qualitatively matching sample complexity lower bound of $Ω(k^{6/7}/\mathrm{poly}(ε))$, even for $d=2$. These sample complexity bounds are surprising because the sample complexity of the problem in the univariate setting is $Θ(k^{4/5}/\mathrm{poly}(ε))$. This has the interesting consequence that the jump from one to two dimensions leads to a substantial increase in sample complexity, while increases beyond that do not.
As a corollary of our general $A_k$ tester, we obtain $d_{\mathrm TV}$-closeness testers for pairs of $k$-histograms on $\mathbb R^d$ over a common unknown partition, and pairs of uniform distributions supported on the union of $k$ unknown disjoint axis-aligned rectangles.
Both our algorithm and our lower bound make essential use of tools from Ramsey theory.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Online Robust Mean Estimation
Authors:
Daniel M. Kane,
Ilias Diakonikolas,
Hanshen Xiao,
Sihan Liu
Abstract:
We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $μ_t$ for the true mean value of the process at time…
▽ More
We study the problem of high-dimensional robust mean estimation in an online setting. Specifically, we consider a scenario where $n$ sensors are measuring some common, ongoing phenomenon. At each time step $t=1,2,\ldots,T$, the $i^{th}$ sensor reports its readings $x^{(i)}_t$ for that time step. The algorithm must then commit to its estimate $μ_t$ for the true mean value of the process at time $t$. We assume that most of the sensors observe independent samples from some common distribution $X$, but an $ε$-fraction of them may instead behave maliciously. The algorithm wishes to compute a good approximation $μ$ to the true mean $μ^\ast := \mathbf{E}[X]$. We note that if the algorithm is allowed to wait until time $T$ to report its estimate, this reduces to the well-studied problem of robust mean estimation. However, the requirement that our algorithm produces partial estimates as the data is coming in substantially complicates the situation.
We prove two main results about online robust mean estimation in this model. First, if the uncorrupted samples satisfy the standard condition of $(ε,δ)$-stability, we give an efficient online algorithm that outputs estimates $μ_t$, $t \in [T],$ such that with high probability it holds that $\|μ-μ^\ast\|_2 = O(δ\log(T))$, where $μ= (μ_t)_{t \in [T]}$. We note that this error bound is nearly competitive with the best offline algorithms, which would achieve $\ell_2$-error of $O(δ)$. Our second main result shows that with additional assumptions on the input (most notably that $X$ is a product distribution) there are inefficient algorithms whose error does not depend on $T$ at all.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
SQ Lower Bounds for Learning Mixtures of Linear Classifiers
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Yuxin Sun
Abstract:
We study the problem of learning mixtures of linear classifiers under Gaussian covariates. Given sample access to a mixture of $r$ distributions on $\mathbb{R}^n$ of the form $(\mathbf{x},y_{\ell})$, $\ell\in [r]$, where $\mathbf{x}\sim\mathcal{N}(0,\mathbf{I}_n)$ and $y_\ell=\mathrm{sign}(\langle\mathbf{v}_\ell,\mathbf{x}\rangle)$ for an unknown unit vector $\mathbf{v}_\ell$, the goal is to learn…
▽ More
We study the problem of learning mixtures of linear classifiers under Gaussian covariates. Given sample access to a mixture of $r$ distributions on $\mathbb{R}^n$ of the form $(\mathbf{x},y_{\ell})$, $\ell\in [r]$, where $\mathbf{x}\sim\mathcal{N}(0,\mathbf{I}_n)$ and $y_\ell=\mathrm{sign}(\langle\mathbf{v}_\ell,\mathbf{x}\rangle)$ for an unknown unit vector $\mathbf{v}_\ell$, the goal is to learn the underlying distribution in total variation distance. Our main result is a Statistical Query (SQ) lower bound suggesting that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures. In particular, we show that the complexity of any SQ algorithm for the problem is $n^{\mathrm{poly}(1/Δ) \log(r)}$, where $Δ$ is a lower bound on the pairwise $\ell_2$-separation between the $\mathbf{v}_\ell$'s. The key technical ingredient underlying our result is a new construction of spherical designs that may be of independent interest.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Theoretical Foundations of Ordinal Multidimensional Scaling, Including Internal and External Unfolding
Authors:
Ery Arias-Castro,
Clément Berenfeld,
Daniel Kane
Abstract:
We provide a comprehensive theory of multiple variants of ordinal multidimensional scaling,including internal unfolding and external unfolding. We first follow Shepard (1966) and work in a continuum model to gain insight. We then follow Kleindessner and von Luxburg (2014) and work in an asymptotic discrete setting.
We provide a comprehensive theory of multiple variants of ordinal multidimensional scaling,including internal unfolding and external unfolding. We first follow Shepard (1966) and work in a continuum model to gain insight. We then follow Kleindessner and von Luxburg (2014) and work in an asymptotic discrete setting.
△ Less
Submitted 17 June, 2025; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Creating Community in a Data Science Classroom
Authors:
David Kane
Abstract:
A community is a collection of people who know and care about each other. The vast majority of college courses are not communities. This is especially true of statistics and data science courses, both because our classes are larger and because we are more likely to lecture. However, it is possible to create a community in your classroom. This article offers an idiosyncratic set of practices for cr…
▽ More
A community is a collection of people who know and care about each other. The vast majority of college courses are not communities. This is especially true of statistics and data science courses, both because our classes are larger and because we are more likely to lecture. However, it is possible to create a community in your classroom. This article offers an idiosyncratic set of practices for creating community. I have used these techniques successfully in first and second semester statistics courses with enrollments ranging from 40 to 120. The key steps are knowing names, cold calling, classroom seating, a shallow learning curve, Study Halls, Recitations and rotating-one-on-one final project presentations.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
New Lower Bounds for Testing Monotonicity and Log Concavity of Distributions
Authors:
Yuqian Cheng,
Daniel M. Kane,
Zhicheng Zheng
Abstract:
We develop a new technique for proving distribution testing lower bounds for properties defined by inequalities involving the bin probabilities of the distribution in question. Using this technique we obtain new lower bounds for monotonicity testing over discrete cubes and tight lower bounds for log-concavity testing.
Our basic technique involves constructing a pair of moment-matching families o…
▽ More
We develop a new technique for proving distribution testing lower bounds for properties defined by inequalities involving the bin probabilities of the distribution in question. Using this technique we obtain new lower bounds for monotonicity testing over discrete cubes and tight lower bounds for log-concavity testing.
Our basic technique involves constructing a pair of moment-matching families of distributions by tweaking the probabilities of pairs of bins so that one family maintains the defining inequalities while the other violates them.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials
Authors:
Ilias Diakonikolas,
Daniel M. Kane
Abstract:
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/ε)^{O(k)}$, where $ε>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity…
▽ More
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/ε)^{O(k)}$, where $ε>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/ε)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.
△ Less
Submitted 25 July, 2023; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise
Authors:
Ilias Diakonikolas,
Jelena Diakonikolas,
Daniel M. Kane,
Puqian Wang,
Nikos Zarifis
Abstract:
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is $\widetildeΘ(d/ε)$,…
▽ More
We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is $\widetildeΘ(d/ε)$, where $d$ is the dimension and $ε$ is the excess error. Our positive result is a computationally efficient learning algorithm with sample complexity $\tilde{O}(d/ε+ d/(\max\{p, ε\})^2)$, where $p$ quantifies the bias of the target halfspace. On the lower bound side, we show that any efficient SQ algorithm (or low-degree test) for the problem requires sample complexity at least $Ω(d^{1/2}/(\max\{p, ε\})^2)$. Our lower bound suggests that this quadratic dependence on $1/ε$ is inherent for efficient algorithms.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise
Authors:
Ilias Diakonikolas,
Jelena Diakonikolas,
Daniel M. Kane,
Puqian Wang,
Nikos Zarifis
Abstract:
We study the problem of PAC learning $γ$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is $\widetildeΘ(1/(γ^2 ε))$. We start by giving a simple efficient alg…
▽ More
We study the problem of PAC learning $γ$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is $\widetildeΘ(1/(γ^2 ε))$. We start by giving a simple efficient algorithm with sample complexity $\widetilde{O}(1/(γ^2 ε^2))$. Our main result is a lower bound for Statistical Query (SQ) algorithms and low-degree polynomial tests suggesting that the quadratic dependence on $1/ε$ in the sample complexity is inherent for computationally efficient algorithms. Specifically, our results imply a lower bound of $\widetildeΩ(1/(γ^{1/2} ε^2))$ on the sample complexity of any efficient SQ learner or low-degree test.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
SQ Lower Bounds for Learning Bounded Covariance GMMs
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Thanasis Pittas,
Nikos Zarifis
Abstract:
We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol μ_i,\mathbf Σ_i)$, where $\mathbf Σ_i = \mathbf Σ\preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol μ_i - \boldsymbol μ_j\|_2 \geq k^ε$ for…
▽ More
We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol μ_i,\mathbf Σ_i)$, where $\mathbf Σ_i = \mathbf Σ\preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol μ_i - \boldsymbol μ_j\|_2 \geq k^ε$ for some $ε>0$. Known learning algorithms for this family of GMMs have complexity $(dk)^{O(1/ε)}$. In this work, we prove that any Statistical Query (SQ) algorithm for this problem requires complexity at least $d^{Ω(1/ε)}$. In the special case where the separation is on the order of $k^{1/2}$, we additionally obtain fine-grained SQ lower bounds with the correct exponent. Our SQ lower bounds imply similar lower bounds for low-degree polynomial tests. Conceptually, our results provide evidence that known algorithms for this problem are nearly best possible.
△ Less
Submitted 22 June, 2023;
originally announced June 2023.
-
Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study principal component analysis (PCA), where given a dataset in $\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA…
▽ More
We study principal component analysis (PCA), where given a dataset in $\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly-linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $α<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(μ, Σ)$, the goal is to output a list of $O(1/α)$ hypotheses at least one of which is close to $Σ$ in relative Frobenius norm. Our main result is a…
▽ More
We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $α<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(μ, Σ)$, the goal is to output a list of $O(1/α)$ hypotheses at least one of which is close to $Σ$ in relative Frobenius norm. Our main result is a $\mathrm{poly}(d,1/α)$ sample and time algorithm for this task that guarantees relative Frobenius norm error of $\mathrm{poly}(1/α)$. Importantly, our algorithm relies purely on spectral techniques. As a corollary, we obtain an efficient spectral algorithm for robust partial clustering of Gaussian mixture models (GMMs) -- a key ingredient in the recent work of [BDJ+22] on robustly learning arbitrary GMMs. Combined with the other components of [BDJ+22], our new method yields the first Sum-of-Squares-free algorithm for robustly learning GMMs. At the technical level, we develop a novel multi-filtering method for list-decodable covariance estimation that may be useful in other settings.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
A review of ptychographic techniques for ultrashort pulse measurement
Authors:
Daniel J Kane,
Andrei B. Vakhtin
Abstract:
The measurement of optical ultrafast laser pulses is done indirectly because the required bandwidth to measure these pulses exceeds the bandwidth of current electronics. As a result, this measurement problem is often posed as a 1-D phase retrieval problem, which is fraught with ambiguities. The phase retrieval method known as ptychography solves this problem by making it possible to measure ultraf…
▽ More
The measurement of optical ultrafast laser pulses is done indirectly because the required bandwidth to measure these pulses exceeds the bandwidth of current electronics. As a result, this measurement problem is often posed as a 1-D phase retrieval problem, which is fraught with ambiguities. The phase retrieval method known as ptychography solves this problem by making it possible to measure ultrafast pulses in either the time domain or the frequency domain. One well known algorithm is the principal components generalized projections algorithm (PCGPA) for extracting pulses from Frequency-Resolved Optical Gating (FROG) measurements. Here, we discuss the development of the PCPGA and introduce new developments including an operator formalism that allows for the convenient addition of external constraints and the development of more robust algorithms. A close cousin, the ptychographic iterative engine will also be covered and compared to the PCGPA. Additional developments using other algorithmic strategies will also be discussed along with new developments combining optics and high-speed electronics to achieve megahertz measurement rates.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Efficient Testable Learning of Halfspaces with Adversarial Label Noise
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Vasilis Kontonis,
Sihan Liu,
Nikos Zarifis
Abstract:
We give the first polynomial-time algorithm for the testable learning of halfspaces in the presence of adversarial label noise under the Gaussian distribution. In the recently introduced testable learning model, one is required to produce a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data. Our tester-learner runs in time…
▽ More
We give the first polynomial-time algorithm for the testable learning of halfspaces in the presence of adversarial label noise under the Gaussian distribution. In the recently introduced testable learning model, one is required to produce a tester-learner such that if the data passes the tester, then one can trust the output of the robust learner on the data. Our tester-learner runs in time $\poly(d/\eps)$ and outputs a halfspace with misclassification error $O(\opt)+\eps$, where $\opt$ is the 0-1 error of the best fitting halfspace. At a technical level, our algorithm employs an iterative soft localization technique enhanced with appropriate testers to ensure that the data distribution is sufficiently similar to a Gaussian.
△ Less
Submitted 9 March, 2023;
originally announced March 2023.
-
Exponential Hardness of Reinforcement Learning with Linear Function Approximation
Authors:
Daniel Kane,
Sihan Liu,
Shachar Lovett,
Gaurav Mahajan,
Csaba Szepesvári,
Gellért Weisz
Abstract:
A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem's counterpart in supervised learning, linear regression, can be solved both statistically and computationally efficiently. Therefore, it was quite surprising when a recent work \cite{kane2022computational} showed a computational-s…
▽ More
A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem's counterpart in supervised learning, linear regression, can be solved both statistically and computationally efficiently. Therefore, it was quite surprising when a recent work \cite{kane2022computational} showed a computational-statistical gap for linear reinforcement learning: even though there are polynomial sample-complexity algorithms, unless NP = RP, there are no polynomial time algorithms for this setting.
In this work, we build on their result to show a computational lower bound, which is exponential in feature dimension and horizon, for linear reinforcement learning under the Randomized Exponential Time Hypothesis. To prove this we build a round-based game where in each round the learner is searching for an unknown vector in a unit hypercube. The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$ε$)-fraction of clauses. We use standard reductions to show this 3-SAT variant is approximately as hard as 3-SAT. Finally, we also show a lower bound optimized for horizon dependence that almost matches the best known upper bound of $\exp(\sqrt{H})$.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren
Abstract:
We study the task of agnostically learning halfspaces under the Gaussian distribution. Specifically, given labeled examples $(\mathbf{x},y)$ from an unknown distribution on $\mathbb{R}^n \times \{ \pm 1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+ε$, where…
▽ More
We study the task of agnostically learning halfspaces under the Gaussian distribution. Specifically, given labeled examples $(\mathbf{x},y)$ from an unknown distribution on $\mathbb{R}^n \times \{ \pm 1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+ε$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. We prove a near-optimal computational hardness result for this task, under the widely believed sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to yield near-optimal lower bounds for related problems, including ReLU regression.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Do PAC-Learners Learn the Marginal Distribution?
Authors:
Max Hopkins,
Daniel M. Kane,
Shachar Lovett,
Gaurav Mahajan
Abstract:
The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $\textit{uniform convergence}$ of empirical error in $H$ to its mean, or equivalently, to the problem of $\textit{density estimation}$, learnability of the underlying marginal distribution with respect to events in $H$. This seminal equivalence relies strongly on PAC learning's `distributi…
▽ More
The Fundamental Theorem of PAC Learning asserts that learnability of a concept class $H$ is equivalent to the $\textit{uniform convergence}$ of empirical error in $H$ to its mean, or equivalently, to the problem of $\textit{density estimation}$, learnability of the underlying marginal distribution with respect to events in $H$. This seminal equivalence relies strongly on PAC learning's `distribution-free' assumption, that the adversary may choose any marginal distribution over data. Unfortunately, the distribution-free model is known to be overly adversarial in practice, failing to predict the success of modern machine learning algorithms, but without the Fundamental Theorem our theoretical understanding of learning under distributional constraints remains highly limited.
In this work, we revisit the connection between PAC learning, uniform convergence, and density estimation beyond the distribution-free setting when the adversary is restricted to choosing a marginal distribution from a known family $\mathscr{P}$. We prove that while the traditional Fundamental Theorem indeed fails, a finer-grained connection between the three fundamental notions continues to hold:
1. PAC-Learning is strictly sandwiched between two refined models of density estimation, differing only in whether the learner $\textit{knows}$ the set of well-estimated events in $H$.
2. Under reasonable assumptions on $H$ and $\mathscr{P}$, density estimation is equivalent to $\textit{uniform estimation}$, a relaxation of uniform convergence allowing non-empirical estimators.
Together, our results give a clearer picture of how the Fundamental Theorem extends beyond the distribution-free setting and shed new light on the classically challenging problem of learning under arbitrary distributional assumptions.
△ Less
Submitted 3 March, 2025; v1 submitted 13 February, 2023;
originally announced February 2023.
-
A Nearly Tight Bound for Fitting an Ellipsoid to Gaussian Random Points
Authors:
Daniel M. Kane,
Ilias Diakonikolas
Abstract:
We prove that for $c>0$ a sufficiently small universal constant that a random set of $c d^2/\log^4(d)$ independent Gaussian random points in $\mathbb{R}^d$ lie on a common ellipsoid with high probability. This nearly establishes a conjecture of~\cite{SaundersonCPW12}, within logarithmic factors. The latter conjecture has attracted significant attention over the past decade, due to its connections…
▽ More
We prove that for $c>0$ a sufficiently small universal constant that a random set of $c d^2/\log^4(d)$ independent Gaussian random points in $\mathbb{R}^d$ lie on a common ellipsoid with high probability. This nearly establishes a conjecture of~\cite{SaundersonCPW12}, within logarithmic factors. The latter conjecture has attracted significant attention over the past decade, due to its connections to machine learning and sum-of-squares lower bounds for certain statistical problems.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
A Strongly Polynomial Algorithm for Approximate Forster Transforms and its Application to Halfspace Learning
Authors:
Ilias Diakonikolas,
Christos Tzamos,
Daniel M. Kane
Abstract:
The Forster transform is a method of regularizing a dataset by placing it in {\em radial isotropic position} while maintaining some of its essential properties. Forster transforms have played a key role in a diverse range of settings spanning computer science and functional analysis. Prior work had given {\em weakly} polynomial time algorithms for computing Forster transforms, when they exist. Our…
▽ More
The Forster transform is a method of regularizing a dataset by placing it in {\em radial isotropic position} while maintaining some of its essential properties. Forster transforms have played a key role in a diverse range of settings spanning computer science and functional analysis. Prior work had given {\em weakly} polynomial time algorithms for computing Forster transforms, when they exist. Our main result is the first {\em strongly polynomial time} algorithm to compute an approximate Forster transform of a given dataset or certify that no such transformation exists. By leveraging our strongly polynomial Forster algorithm, we obtain the first strongly polynomial time algorithm for {\em distribution-free} PAC learning of halfspaces. This learning result is surprising because {\em proper} PAC learning of halfspaces is {\em equivalent} to linear programming. Our learning approach extends to give a strongly polynomial halfspace learner in the presence of random classification noise and, more generally, Massart noise.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Jasper C. H. Lee,
Ankit Pensia
Abstract:
We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $μ$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $μ$ with high probability. Prior work had obtained…
▽ More
We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean $μ$ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates $μ$ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability $τ$, having an additive $\log(1/τ)$ dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Asymptotic Nusselt numbers for internal flow in the Cassie state
Authors:
Daniel Kane,
Marc Hodes,
Martin Z. Bazant,
Toby L. Kirk
Abstract:
We consider laminar, fully-developed, Poiseuille flows of liquid in the Cassie state through diabatic, parallel-plate microchannels symmetrically textured with isoflux ridges. Through the use of matched asymptotic expansions we analytically develop expressions for (apparent hydrodynamic) slip lengths and variously-defined Nusselt numbers. Our small parameter ($ε$) is the pitch of the ridges divide…
▽ More
We consider laminar, fully-developed, Poiseuille flows of liquid in the Cassie state through diabatic, parallel-plate microchannels symmetrically textured with isoflux ridges. Through the use of matched asymptotic expansions we analytically develop expressions for (apparent hydrodynamic) slip lengths and variously-defined Nusselt numbers. Our small parameter ($ε$) is the pitch of the ridges divided by the height of the microchannel. When the ridges are oriented parallel to the flow, we quantify the error in the Nusselt number expressions in the literature and provide a new closed-form result. The latter is accurate to $O\left(ε^2\right)$ and valid for any solid (ridge) fraction, whereas those in the current literature are accurate to $O\left(ε^1\right)$ and breakdown in the important limit when solid fraction approaches zero. When the ridges are oriented transverse to the (periodically fully-developed) flow, the error associated with neglecting inertial effects in the slip length is shown to be $O\left(ε^3\mathrm{Re}\right)$, where $\mathrm{Re}$ is the channel-scale Reynolds number based on its hydraulic diameter. The corresponding Nusselt number expressions are new and their accuracy is shown to be dependent on Reynolds number, Peclet number and Prandtl number in addition to $ε$. Manipulating the solution to the inner temperature problem encountered in the vicinity of the ridges shows that classic results for thermal spreading resistance are better expressed in terms of polylogarithm functions.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Gaussian Mean Testing Made Simple
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Ankit Pensia
Abstract:
We study the following fundamental hypothesis testing problem, which we term Gaussian mean testing. Given i.i.d. samples from a distribution $p$ on $\mathbb{R}^d$, the task is to distinguish, with high probability, between the following cases: (i) $p$ is the standard Gaussian distribution, $\mathcal{N}(0,I_d)$, and (ii) $p$ is a Gaussian $\mathcal{N}(μ,Σ)$ for some unknown covariance $Σ$ and mean…
▽ More
We study the following fundamental hypothesis testing problem, which we term Gaussian mean testing. Given i.i.d. samples from a distribution $p$ on $\mathbb{R}^d$, the task is to distinguish, with high probability, between the following cases: (i) $p$ is the standard Gaussian distribution, $\mathcal{N}(0,I_d)$, and (ii) $p$ is a Gaussian $\mathcal{N}(μ,Σ)$ for some unknown covariance $Σ$ and mean $μ\in \mathbb{R}^d$ satisfying $\|μ\|_2 \geq ε$. Recent work gave an algorithm for this testing problem with the optimal sample complexity of $Θ(\sqrt{d}/ε^2)$. Both the previous algorithm and its analysis are quite complicated. Here we give an extremely simple algorithm for Gaussian mean testing with a one-page analysis. Our algorithm is sample optimal and runs in sample linear time.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
SQ Lower Bounds for Learning Single Neurons with Massart Noise
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Lisheng Ren,
Yuxin Sun
Abstract:
We study the problem of PAC learning a single neuron in the presence of Massart noise. Specifically, for a known activation function $f: \mathbb{R} \to \mathbb{R}$, the learner is given access to labeled examples $(\mathbf{x}, y) \in \mathbb{R}^d \times \mathbb{R}$, where the marginal distribution of $\mathbf{x}$ is arbitrary and the corresponding label $y$ is a Massart corruption of…
▽ More
We study the problem of PAC learning a single neuron in the presence of Massart noise. Specifically, for a known activation function $f: \mathbb{R} \to \mathbb{R}$, the learner is given access to labeled examples $(\mathbf{x}, y) \in \mathbb{R}^d \times \mathbb{R}$, where the marginal distribution of $\mathbf{x}$ is arbitrary and the corresponding label $y$ is a Massart corruption of $f(\langle \mathbf{w}, \mathbf{x} \rangle)$. The goal of the learner is to output a hypothesis $h: \mathbb{R}^d \to \mathbb{R}$ with small squared loss. For a range of activation functions, including ReLUs, we establish super-polynomial Statistical Query (SQ) lower bounds for this learning problem. In more detail, we prove that no efficient SQ algorithm can approximate the optimal error within any constant factor. Our main technical contribution is a novel SQ-hard construction for learning $\{ \pm 1\}$-weight Massart halfspaces on the Boolean hypercube that is interesting on its own right.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Cryptographic Hardness of Learning Halfspaces with Massart Noise
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Pasin Manurangsi,
Lisheng Ren
Abstract:
We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. labeled examples $(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$, where the distribution of $\mathbf{x}$ is arbitrary and the label $y$ is a Massart corruption of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$, with flipping probability…
▽ More
We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. labeled examples $(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$, where the distribution of $\mathbf{x}$ is arbitrary and the label $y$ is a Massart corruption of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$, with flipping probability $η(\mathbf{x}) \leq η< 1/2$. The goal of the learner is to compute a hypothesis with small 0-1 error. Our main result is the first computational hardness result for this learning problem. Specifically, assuming the (widely believed) subexponential-time hardness of the Learning with Errors (LWE) problem, we show that no polynomial-time Massart halfspace learner can achieve error better than $Ω(η)$, even if the optimal 0-1 error is small, namely $\mathrm{OPT} = 2^{-\log^{c} (N)}$ for any universal constant $c \in (0, 1)$. Prior work had provided qualitatively similar evidence of hardness in the Statistical Query model. Our computational hardness result essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known efficient learning algorithms for the problem are nearly best possible.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Near-Optimal Bounds for Testing Histogram Distributions
Authors:
Clément L. Canonne,
Ilias Diakonikolas,
Daniel M. Kane,
Sihan Liu
Abstract:
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from…
▽ More
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $\mathbf{p}$ on $[n]$, we want to distinguish between the cases that $\mathbf{p}$ is a $k$-histogram versus $\varepsilon$-far from any $k$-histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde Θ(\sqrt{nk} / \varepsilon + k / \varepsilon^2 + \sqrt{n} / \varepsilon^2)$.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sushrut Karmalkar,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $α\in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor αm \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $μ$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates co…
▽ More
We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $α\in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor αm \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $μ$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector $\widehat μ$ such that $\| \widehat μ- μ\|_2$ is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with "certifiably bounded" $t$-th moments in $k$-sparse directions and sufficiently light tails, our algorithm achieves error of $(1/α)^{O(1/t)}$ with sample complexity $m = (k\log(n))^{O(t)}/α$ and running time $\mathrm{poly}(mn^t)$. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Θ(\sqrt{\log(1/α)})$ with quasi-polynomial sample and computational complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.
△ Less
Submitted 5 July, 2024; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Optimal SQ Lower Bounds for Robustly Learning Discrete Product Distributions and Ising Models
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Yuxin Sun
Abstract:
We establish optimal Statistical Query (SQ) lower bounds for robustly learning certain families of discrete high-dimensional distributions. In particular, we show that no efficient SQ algorithm with access to an $ε$-corrupted binary product distribution can learn its mean within $\ell_2$-error $o(ε\sqrt{\log(1/ε)})$. Similarly, we show that no efficient SQ algorithm with access to an $ε$-corrupted…
▽ More
We establish optimal Statistical Query (SQ) lower bounds for robustly learning certain families of discrete high-dimensional distributions. In particular, we show that no efficient SQ algorithm with access to an $ε$-corrupted binary product distribution can learn its mean within $\ell_2$-error $o(ε\sqrt{\log(1/ε)})$. Similarly, we show that no efficient SQ algorithm with access to an $ε$-corrupted ferromagnetic high-temperature Ising model can learn the model to total variation distance $o(ε\log(1/ε))$. Our SQ lower bounds match the error guarantees of known algorithms for these problems, providing evidence that current upper bounds for these tasks are best possible. At the technical level, we develop a generic SQ lower bound for discrete high-dimensional distributions starting from low dimensional moment matching constructions that we believe will find other applications. Additionally, we introduce new ideas to analyze these moment-matching constructions for discrete univariate distributions.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Robust Sparse Mean Estimation via Sum of Squares
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Sushrut Karmalkar,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study the problem of high-dimensional sparse mean estimation in the presence of an $ε$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For dis…
▽ More
We study the problem of high-dimensional sparse mean estimation in the presence of an $ε$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For distributions on $\mathbb R^d$ with "certifiably bounded" $t$-th moments and sufficiently light tails, our algorithm achieves error of $O(ε^{1-1/t})$ with sample complexity $m = (k\log(d))^{O(t)}/ε^{2-2/t}$. For the special case of the Gaussian distribution, our algorithm achieves near-optimal error of $\tilde O(ε)$ with sample complexity $m = O(k^4 \mathrm{polylog}(d))/ε^2$. Our algorithms follow the Sum-of-Squares based, proofs to algorithms approach. We complement our upper bounds with Statistical Query and low-degree polynomial testing lower bounds, providing evidence that the sample-time-error tradeoffs achieved by our algorithms are qualitatively the best possible.
△ Less
Submitted 5 July, 2024; v1 submitted 7 June, 2022;
originally announced June 2022.
-
Streaming Algorithms for High-Dimensional Robust Statistics
Authors:
Ilias Diakonikolas,
Daniel M. Kane,
Ankit Pensia,
Thanasis Pittas
Abstract:
We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for hi…
▽ More
We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for high-dimensional robust statistics with near-optimal memory requirements (up to logarithmic factors). Our main result is for the task of high-dimensional robust mean estimation in (a strengthening of) Huber's contamination model. We give an efficient single-pass streaming algorithm for this task with near-optimal error guarantees and space complexity nearly-linear in the dimension. As a corollary, we obtain streaming algorithms with near-optimal space complexity for several more complex tasks, including robust covariance estimation, robust regression, and more generally robust stochastic optimization.
△ Less
Submitted 3 May, 2023; v1 submitted 26 April, 2022;
originally announced April 2022.