-
Robust estimation via generalized quasi-gradients
Authors:
Banghua Zhu,
Jiantao Jiao,
Jacob Steinhardt
Abstract:
We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global…
▽ More
We explore why many recently proposed robust estimation problems are efficiently solvable, even though the underlying optimization problems are non-convex. We study the loss landscape of these robust estimation problems, and identify the existence of "generalized quasi-gradients". Whenever these quasi-gradients exist, a large family of low-regret algorithms are guaranteed to approximate the global minimum; this includes the commonly-used filtering algorithm.
For robust mean estimation of distributions under bounded covariance, we show that any first-order stationary point of the associated optimization problem is an {approximate global minimum} if and only if the corruption level $ε< 1/3$. Consequently, any optimization algorithm that aproaches a stationary point yields an efficient robust estimator with breakdown point $1/3$. With careful initialization and step size, we improve this to $1/2$, which is optimal.
For other tasks, including linear regression and joint mean and covariance estimation, the loss landscape is more rugged: there are stationary points arbitrarily far from the global minimum. Nevertheless, we show that generalized quasi-gradients exist and construct efficient algorithms. These algorithms are simpler than previous ones in the literature, and for linear regression we improve the estimation error from $O(\sqrtε)$ to the optimal rate of $O(ε)$ for small $ε$ assuming certified hypercontractivity. For mean estimation with near-identity covariance, we show that a simple gradient descent algorithm achieves breakdown point $1/3$ and iteration complexity $\tilde{O}(d/ε^2)$.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
When does the Tukey median work?
Authors:
Banghua Zhu,
Jiantao Jiao,
Jacob Steinhardt
Abstract:
We analyze the performance of the Tukey median estimator under total variation (TV) distance corruptions. Previous results show that under Huber's additive corruption model, the breakdown point is 1/3 for high-dimensional halfspace-symmetric distributions. We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions. We also show that a certain projection…
▽ More
We analyze the performance of the Tukey median estimator under total variation (TV) distance corruptions. Previous results show that under Huber's additive corruption model, the breakdown point is 1/3 for high-dimensional halfspace-symmetric distributions. We show that under TV corruptions, the breakdown point reduces to 1/4 for the same set of distributions. We also show that a certain projection algorithm can attain the optimal breakdown point of 1/2. Both the Tukey median estimator and the projection algorithm achieve sample complexity linear in dimension.
△ Less
Submitted 31 March, 2020; v1 submitted 21 January, 2020;
originally announced January 2020.
-
Generalized Resilience and Robust Statistics
Authors:
Banghua Zhu,
Jiantao Jiao,
Jacob Steinhardt
Abstract:
Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the robust statistics approach to consider perturbations under any Wasserstein distance, and show that robust estimation is possible whenever a distribution's populati…
▽ More
Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be corrupted in many other ways, such as systematic measurement errors and missing covariates. We generalize the robust statistics approach to consider perturbations under any Wasserstein distance, and show that robust estimation is possible whenever a distribution's population statistics are robust under a certain family of friendly perturbations. This generalizes a property called resilience previously employed in the special case of mean estimation with outliers. We justify the generalized resilience property by showing that it holds under moment or hypercontractive conditions. Even in the total variation case, these subsume conditions in the literature for mean estimation, regression, and covariance estimation; the resulting analysis simplifies and sometimes improves these known results in both population limit and finite-sample rate. Our robust estimators are based on minimum distance (MD) functionals (Donoho and Liu, 1988), which project onto a set of distributions under a discrepancy related to the perturbation. We present two approaches for designing MD estimators with good finite-sample rates: weakening the discrepancy and expanding the set of distributions. We also present connections to Gao et al. (2019)'s recent analysis of generative adversarial networks for robust estimation.
△ Less
Submitted 13 December, 2020; v1 submitted 18 September, 2019;
originally announced September 2019.
-
Hyperuniformity and anti-hyperuniformity in one-dimensional substitution tilings
Authors:
Erdal C. Oğuz,
Joshua E. S. Socolar,
Paul J. Steinhardt,
Salvatore Torquato
Abstract:
We consider the scaling properties characterizing the hyperuniformity (or anti-hyperuniformity) of long wavelength fluctuations in a broad class of one-dimensional substitution tilings. We present a simple argument that predicts the exponent $α$ governing the scaling of Fourier intensities at small wavenumbers, tilings with $α>0$ being hyperuniform, and confirm with numerical computations that the…
▽ More
We consider the scaling properties characterizing the hyperuniformity (or anti-hyperuniformity) of long wavelength fluctuations in a broad class of one-dimensional substitution tilings. We present a simple argument that predicts the exponent $α$ governing the scaling of Fourier intensities at small wavenumbers, tilings with $α>0$ being hyperuniform, and confirm with numerical computations that the predictions are accurate for quasiperiodic tilings, tilings with singular continuous spectra, and limit-periodic tilings. Tilings with quasiperiodic or singular continuous spectra can be constructed with $α$ arbitrarily close to any given value between $-1$ and $3$. Limit-periodic tilings can be constructed with $α$ between $-1$ and $1$ or with Fourier intensities that approach zero faster than any power law.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Does robustness imply tractability? A lower bound for planted clique in the semi-random model
Authors:
Jacob Steinhardt
Abstract:
We consider a robust analog of the planted clique problem. In this analog, a set $S$ of vertices is chosen and all edges in $S$ are included; then, edges between $S$ and the rest of the graph are included with probability $\frac{1}{2}$, while edges not touching $S$ are allowed to vary arbitrarily. For this semi-random model, we show that the information-theoretic threshold for recovery is…
▽ More
We consider a robust analog of the planted clique problem. In this analog, a set $S$ of vertices is chosen and all edges in $S$ are included; then, edges between $S$ and the rest of the graph are included with probability $\frac{1}{2}$, while edges not touching $S$ are allowed to vary arbitrarily. For this semi-random model, we show that the information-theoretic threshold for recovery is $\tildeΘ(\sqrt{n})$, in sharp contrast to the classical information-theoretic threshold of $Θ(\log(n))$. This matches the conjectured computational threshold for the classical planted clique problem, and thus raises the intriguing possibility that, once we require robustness, there is no computational-statistical gap for planted clique. Our lower bound involves establishing a result regarding the KL divergence of a family of perturbed Bernoulli distributions, which may be of independent interest.
△ Less
Submitted 4 September, 2018; v1 submitted 17 April, 2017;
originally announced April 2017.
-
Learning from Untrusted Data
Authors:
Moses Charikar,
Jacob Steinhardt,
Gregory Valiant
Abstract:
The vast majority of theoretical results in machine learning and statistics assume that the available training data is a reasonably reliable reflection of the phenomena to be learned or estimated. Similarly, the majority of machine learning and statistical techniques used in practice are brittle to the presence of large amounts of biased or malicious data. In this work we consider two frameworks i…
▽ More
The vast majority of theoretical results in machine learning and statistics assume that the available training data is a reasonably reliable reflection of the phenomena to be learned or estimated. Similarly, the majority of machine learning and statistical techniques used in practice are brittle to the presence of large amounts of biased or malicious data. In this work we consider two frameworks in which to study estimation, learning, and optimization in the presence of significant fractions of arbitrary data.
The first framework, list-decodable learning, asks whether it is possible to return a list of answers, with the guarantee that at least one of them is accurate. For example, given a dataset of $n$ points for which an unknown subset of $αn$ points are drawn from a distribution of interest, and no assumptions are made about the remaining $(1-α)n$ points, is it possible to return a list of $\operatorname{poly}(1/α)$ answers, one of which is correct? The second framework, which we term the semi-verified learning model, considers the extent to which a small dataset of trusted data (drawn from the distribution in question) can be leveraged to enable the accurate extraction of information from a much larger but untrusted dataset (of which only an $α$-fraction is drawn from the distribution).
We show strong positive results in both settings, and provide an algorithm for robust learning in a very general stochastic optimization setting. This general result has immediate implications for robust estimation in a number of settings, including for robustly estimating the mean of distributions with bounded second moments, robustly learning mixtures of such distributions, and robustly finding planted partitions in random graphs in which significant portions of the graph have been perturbed by an adversary.
△ Less
Submitted 11 June, 2017; v1 submitted 7 November, 2016;
originally announced November 2016.
-
Coxeter Pairs, Ammann Patterns and Penrose-like Tilings
Authors:
Latham Boyle,
Paul J. Steinhardt
Abstract:
We identify a precise geometric relationship between: (i) certain natural pairs of irreducible reflection groups (``Coxeter pairs"); (ii) self-similar quasicrystalline patterns formed by superposing sets of 1D quasi-periodically-spaced lines, planes or hyper-planes (``Ammann patterns"); and (iii) the tilings dual to these patterns (``Penrose-like tilings"). We use this relationship to obtain all i…
▽ More
We identify a precise geometric relationship between: (i) certain natural pairs of irreducible reflection groups (``Coxeter pairs"); (ii) self-similar quasicrystalline patterns formed by superposing sets of 1D quasi-periodically-spaced lines, planes or hyper-planes (``Ammann patterns"); and (iii) the tilings dual to these patterns (``Penrose-like tilings"). We use this relationship to obtain all irreducible Ammann patterns and their dual Penrose-like tilings, along with their key properties in a simple, systematic and unified way, expanding the number of known examples from four to infinity. For each symmetry, we identify the minimal Ammann patterns (those composed of the fewest 1d quasiperiodic sets) and construct the associated Penrose-like tilings: 11 in 2D, 9 in 3D and one in 4D. These include the original Penrose tiling, the four other previously known Penrose-like tilings, and sixteen that are new. We also complete the enumeration of the quasicrystallographic space groups corresponding to the irreducible non-crystallographic reflection groups, by showing that there is a unique such space group in 4D (with nothing beyond 4D).
△ Less
Submitted 6 October, 2022; v1 submitted 29 August, 2016;
originally announced August 2016.
-
The Statistics of Streaming Sparse Regression
Authors:
Jacob Steinhardt,
Stefan Wager,
Percy Liang
Abstract:
We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under similar conditions to the lasso. In the linear regression setup with irrepresentable noise features, our algorithm recovers the support set of the optimal parameter vector with high probability, and achieves a statistically quasi-optimal rate of convergence of Op(k log(d)/T), where k is the sparsit…
▽ More
We present a sparse analogue to stochastic gradient descent that is guaranteed to perform well under similar conditions to the lasso. In the linear regression setup with irrepresentable noise features, our algorithm recovers the support set of the optimal parameter vector with high probability, and achieves a statistically quasi-optimal rate of convergence of Op(k log(d)/T), where k is the sparsity of the solution, d is the number of features, and T is the number of training examples. Meanwhile, our algorithm does not require any more computational resources than stochastic gradient descent. In our experiments, we find that our method substantially out-performs existing streaming algorithms on both real and simulated data.
△ Less
Submitted 12 December, 2014;
originally announced December 2014.
-
Permutations with Ascending and Descending Blocks
Authors:
Jacob Steinhardt
Abstract:
We investigate permutations in terms of their cycle structure and descent set. To do this, we generalize the classical bijection of Gessel and Reutenauer to deal with permutations that have some ascending and some descending blocks. We then provide the first bijective proofs of some known results. We also solve some problems posed in [3] by Eriksen, Freij, and Wastlund, who study derangements th…
▽ More
We investigate permutations in terms of their cycle structure and descent set. To do this, we generalize the classical bijection of Gessel and Reutenauer to deal with permutations that have some ascending and some descending blocks. We then provide the first bijective proofs of some known results. We also solve some problems posed in [3] by Eriksen, Freij, and Wastlund, who study derangements that descend in blocks of prescribed lengths.
△ Less
Submitted 1 September, 2009; v1 submitted 29 August, 2009;
originally announced August 2009.
-
Derangements with Ascending and Descending Blocks
Authors:
Jacob Steinhardt
Abstract:
We continue the work of Eriksen, Freij, and Wastlund [3], who study derangements that descend in blocks of prescribed lengths. We generalize their work to derangements that ascend in some blocks and descend in others. In particular, we obtain a generating function for the derangements that ascend in blocks of prescribed lengths, thus solving a problem posed in [3]. We also work towards a combina…
▽ More
We continue the work of Eriksen, Freij, and Wastlund [3], who study derangements that descend in blocks of prescribed lengths. We generalize their work to derangements that ascend in some blocks and descend in others. In particular, we obtain a generating function for the derangements that ascend in blocks of prescribed lengths, thus solving a problem posed in [3]. We also work towards a combinatorial interpretation of a polynomial sum appearing in [3]. As a result, we obtain a new combinatorial sum for counting derangements with ascending and descending blocks.
△ Less
Submitted 29 August, 2009; v1 submitted 23 August, 2009;
originally announced August 2009.
-
On Coloring the Odd-Distance Graph
Authors:
Jacob Steinhardt
Abstract:
We present a proof, using spectral techniques, that there is no finite measurable coloring of the odd-distance graph.
We present a proof, using spectral techniques, that there is no finite measurable coloring of the odd-distance graph.
△ Less
Submitted 11 August, 2009;
originally announced August 2009.
-
Cayley graphs formed by conjugate generating sets of S_n
Authors:
Jacob Steinhardt
Abstract:
We investigate subsets of the symmetric group with structure similar to that of a graph. The trees of these subsets correspond to minimal conjugate generating sets of the symmetric group. There are two main theorems in this paper. The first is a characterization of minimal conjugate generating sets of S_n. The second is a generalization of a result due to Feng characterizing the automorphism gro…
▽ More
We investigate subsets of the symmetric group with structure similar to that of a graph. The trees of these subsets correspond to minimal conjugate generating sets of the symmetric group. There are two main theorems in this paper. The first is a characterization of minimal conjugate generating sets of S_n. The second is a generalization of a result due to Feng characterizing the automorphism groups of the Cayley graphs formed by certain generating sets composed of cycles. We compute the full automorphism groups subject to a weak condition and conjecture that the characterization still holds without the condition. We also present some computational results in relation to hamiltonicity of Cayley graphs, including a generalization of the work on quasi-hamiltonicity by Gutin and Yeo to undirected graphs.
△ Less
Submitted 19 November, 2007;
originally announced November 2007.