Search | arXiv e-print repository

Linear-Time User-Level DP-SCO via Robust Statistics

Authors: Badih Ghazi, Ravi Kumar, Daogao Liu, Pasin Manurangsi

Abstract: User-level differentially private stochastic convex optimization (DP-SCO) has garnered significant attention due to the paramount importance of safeguarding user privacy in modern large-scale machine learning applications. Current methods, such as those based on differentially private stochastic gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility due to th… ▽ More User-level differentially private stochastic convex optimization (DP-SCO) has garnered significant attention due to the paramount importance of safeguarding user privacy in modern large-scale machine learning applications. Current methods, such as those based on differentially private stochastic gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility due to the need to privatize every intermediate iterate. In this work, we introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges. Our approach uniquely bounds the sensitivity of all intermediate iterates of SGD with gradient estimation based on robust statistics, thereby significantly reducing the gradient estimation noise for privacy purposes and enhancing the privacy-utility trade-off. By sidestepping the repeated privatization required by previous methods, our algorithm not only achieves an improved theoretical privacy-utility trade-off but also maintains computational efficiency. We complement our algorithm with an information-theoretic lower bound, showing that our upper bound is optimal up to logarithmic factors and the dependence on $ε$. This work sets the stage for more robust and efficient privacy-preserving techniques in machine learning, with implications for future research and application in the field. △ Less

Submitted 12 February, 2025; originally announced February 2025.

arXiv:2412.16802 [pdf, other]

Balls-and-Bins Sampling for DP-SGD

Authors: Lynn Chua, Badih Ghazi, Charlie Harrison, Ethan Leeman, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

Abstract: We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much… ▽ More We introduce the Balls-and-Bins sampling for differentially private (DP) optimization methods such as DP-SGD. While it has been common practice to use some form of shuffling in DP-SGD implementations, privacy accounting algorithms have typically assumed that Poisson subsampling is used instead. Recent work by Chua et al. (ICML 2024), however, pointed out that shuffling based DP-SGD can have a much larger privacy cost in practical regimes of parameters. In this work we show that the Balls-and-Bins sampling achieves the "best-of-both" samplers, namely, the implementation of Balls-and-Bins sampling is similar to that of Shuffling and models trained using DP-SGD with Balls-and-Bins sampling achieve utility comparable to those trained using DP-SGD with Shuffling at the same noise multiplier, and yet, Balls-and-Bins sampling enjoys similar-or-better privacy amplification as compared to Poisson subsampling in practical regimes. △ Less

Submitted 31 March, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

Comments: Conference Proceedings version for AISTATS 2025

arXiv:2404.10881 [pdf, ps, other]

Differentially Private Optimization with Sparse Gradients

Authors: Badih Ghazi, Cristóbal Guzmán, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Abstract: Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms wit… ▽ More Motivated by applications of large embedding models, we study differentially private (DP) optimization problems under sparsity of individual gradients. We start with new near-optimal bounds for the classic mean estimation problem but with sparse data, improving upon existing algorithms particularly for the high-dimensional regime. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for stochastic convex optimization with sparse gradients; the former represents the first nearly dimension-independent rates for this problem. Finally, we study the approximation of stationary points for the empirical loss in approximate-DP optimization and obtain rates that depend on sparsity instead of dimension, modulo polylogarithmic factors. △ Less

Submitted 31 October, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Minor corrections and re-structuring of the presentation

arXiv:2306.15744 [pdf, ps, other]

Ticketed Learning-Unlearning Schemes

Authors: Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Ayush Sekhari, Chiyuan Zhang

Abstract: We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when… ▽ More We consider the learning--unlearning paradigm defined as follows. First given a dataset, the goal is to learn a good predictor, such as one minimizing a certain loss. Subsequently, given any subset of examples that wish to be unlearnt, the goal is to learn, without the knowledge of the original training dataset, a good predictor that is identical to the predictor that would have been produced when learning from scratch on the surviving examples. We propose a new ticketed model for learning--unlearning wherein the learning algorithm can send back additional information in the form of a small-sized (encrypted) ``ticket'' to each participating training example, in addition to retaining a small amount of ``central'' information for later. Subsequently, the examples that wish to be unlearnt present their tickets to the unlearning algorithm, which additionally uses the central information to return a new predictor. We provide space-efficient ticketed learning--unlearning schemes for a broad family of concept classes, including thresholds, parities, intersection-closed classes, among others. En route, we introduce the count-to-zero problem, where during unlearning, the goal is to simply know if there are any examples that survived. We give a ticketed learning--unlearning scheme for this problem that relies on the construction of Sperner families with certain properties, which might be of independent interest. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Conference on Learning Theory (COLT) 2023

arXiv:2210.15175 [pdf, ps, other]

Private Isotonic Regression

Authors: Badih Ghazi, Pritish Kamath, Ravi Kumar, Pasin Manurangsi

Abstract: In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly… ▽ More In this paper, we consider the problem of differentially private (DP) algorithms for isotonic regression. For the most general problem of isotonic regression over a partially ordered set (poset) $\mathcal{X}$ and for any Lipschitz loss function, we obtain a pure-DP algorithm that, given $n$ input points, has an expected excess empirical risk of roughly $\mathrm{width}(\mathcal{X}) \cdot \log|\mathcal{X}| / n$, where $\mathrm{width}(\mathcal{X})$ is the width of the poset. In contrast, we also obtain a near-matching lower bound of roughly $(\mathrm{width}(\mathcal{X}) + \log |\mathcal{X}|) / n$, that holds even for approximate-DP algorithms. Moreover, we show that the above bounds are essentially the best that can be obtained without utilizing any further structure of the poset. In the special case of a totally ordered set and for $\ell_1$ and $\ell_2^2$ losses, our algorithm can be implemented in near-linear running time; we also provide extensions of this algorithm to the problem of private isotonic regression with additional structural constraints on the output function. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: Neural Information Processing Systems (NeurIPS), 2022

arXiv:2112.03548 [pdf, ps, other]

Private Robust Estimation by Stabilizing Convex Relaxations

Authors: Pravesh K. Kothari, Pasin Manurangsi, Ameya Velingker

Abstract: We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifi… ▽ More We give the first polynomial time and sample $(ε, δ)$-differentially private (DP) algorithm to estimate the mean, covariance and higher moments in the presence of a constant fraction of adversarial outliers. Our algorithm succeeds for families of distributions that satisfy two well-studied properties in prior works on robust estimation: certifiable subgaussianity of directional moments and certifiable hypercontractivity of degree 2 polynomials. Our recovery guarantees hold in the "right affine-invariant norms": Mahalanobis distance for mean, multiplicative spectral and relative Frobenius distance guarantees for covariance and injective norms for higher moments. Prior works obtained private robust algorithms for mean estimation of subgaussian distributions with bounded covariance. For covariance estimation, ours is the first efficient algorithm (even in the absence of outliers) that succeeds without any condition-number assumptions. Our algorithms arise from a new framework that provides a general blueprint for modifying convex relaxations for robust estimation to satisfy strong worst-case stability guarantees in the appropriate parameter norms whenever the algorithms produce witnesses of correctness in their run. We verify such guarantees for a modification of standard sum-of-squares (SoS) semidefinite programming relaxations for robust estimation. Our privacy guarantees are obtained by combining stability guarantees with a new "estimate dependent" noise injection mechanism in which noise scales with the eigenvalues of the estimated covariance. We believe this framework will be useful more generally in obtaining DP counterparts of robust estimators. Independently of our work, Ashtiani and Liaw [AL21] also obtained a polynomial time and sample private robust estimation algorithm for Gaussian distributions. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2011.14580 [pdf, other]

Robust and Private Learning of Halfspaces

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Thao Nguyen

Abstract: In this work, we study the trade-off between differential privacy and adversarial robustness under L2-perturbations in the context of learning halfspaces. We prove nearly tight bounds on the sample complexity of robust private learning of halfspaces for a large regime of parameters. A highlight of our results is that robust and private learning is harder than robust or private learning alone. We c… ▽ More In this work, we study the trade-off between differential privacy and adversarial robustness under L2-perturbations in the context of learning halfspaces. We prove nearly tight bounds on the sample complexity of robust private learning of halfspaces for a large regime of parameters. A highlight of our results is that robust and private learning is harder than robust or private learning alone. We complement our theoretical analysis with experimental results on the MNIST and USPS datasets, for a learning algorithm that is both differentially private and adversarially robust. △ Less

Submitted 25 March, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: AISTATS 2021

arXiv:2009.09604 [pdf, ps, other]

On Distributed Differential Privacy and Counting Distinct Elements

Authors: Lijie Chen, Badih Ghazi, Ravi Kumar, Pasin Manurangsi

Abstract: We study the setup where each of $n$ users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of $(ε, δ)$-differentially privacy: - In the non-interactive local setting, we prove that the additive error of any protocol is $Ω(n)$ for any constant $ε$ and for any $δ$ inverse polynomial in $n$. - In the single-mess… ▽ More We study the setup where each of $n$ users holds an element from a discrete set, and the goal is to count the number of distinct elements across all users, under the constraint of $(ε, δ)$-differentially privacy: - In the non-interactive local setting, we prove that the additive error of any protocol is $Ω(n)$ for any constant $ε$ and for any $δ$ inverse polynomial in $n$. - In the single-message shuffle setting, we prove a lower bound of $Ω(n)$ on the error for any constant $ε$ and for some $δ$ inverse quasi-polynomial in $n$. We do so by building on the moment-matching method from the literature on distribution estimation. - In the multi-message shuffle setting, we give a protocol with at most one message per user in expectation and with an error of $\tilde{O}(\sqrt(n))$ for any constant $ε$ and for any $δ$ inverse polynomial in $n$. Our protocol is also robustly shuffle private, and our error of $\sqrt(n)$ matches a known lower bound for such protocols. Our proof technique relies on a new notion, that we call dominated protocols, and which can also be used to obtain the first non-trivial lower bounds against multi-message shuffle protocols for the well-studied problems of selection and learning parity. Our first lower bound for estimating the number of distinct elements provides the first $ω(\sqrt(n))$ separation between global sensitivity and error in local differential privacy, thus answering an open question of Vadhan (2017). We also provide a simple construction that gives $\tildeΩ(n)$ separation between global sensitivity and error in two-party differential privacy, thereby answering an open question of McGregor et al. (2011). △ Less

Submitted 21 September, 2020; originally announced September 2020.

Comments: 68 pages, 4 algorithms

arXiv:2008.08007 [pdf, ps, other]

Differentially Private Clustering: Tight Approximation Ratios

Authors: Badih Ghazi, Ravi Kumar, Pasin Manurangsi

Abstract: We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing… ▽ More We study the task of differentially private clustering. For several basic clustering problems, including Euclidean DensestBall, 1-Cluster, k-means, and k-median, we give efficient differentially private algorithms that achieve essentially the same approximation ratios as those that can be obtained by any non-private algorithm, while incurring only small additive errors. This improves upon existing efficient algorithms that only achieve some large constant approximation factors. Our results also imply an improved algorithm for the Sample and Aggregate privacy framework. Furthermore, we show that one of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: 60 pages, 1 table

arXiv:2007.15220 [pdf, ps, other]

The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise

Authors: Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi

Abstract: We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of our findings is that the $L_{\infty}$ perturbations case is prov… ▽ More We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of our findings is that the $L_{\infty}$ perturbations case is provably computationally harder than the case $2 \leq p < \infty$. △ Less

Submitted 30 July, 2020; originally announced July 2020.

arXiv:2007.03668 [pdf, ps, other]

Near-tight closure bounds for Littlestone and threshold dimensions

Authors: Badih Ghazi, Noah Golowich, Ravi Kumar, Pasin Manurangsi

Abstract: We study closure properties for the Littlestone and threshold dimensions of binary hypothesis classes. Given classes $\mathcal{H}_1, \ldots, \mathcal{H}_k$ of Boolean functions with bounded Littlestone (respectively, threshold) dimension, we establish an upper bound on the Littlestone (respectively, threshold) dimension of the class defined by applying an arbitrary binary aggregation rule to… ▽ More We study closure properties for the Littlestone and threshold dimensions of binary hypothesis classes. Given classes $\mathcal{H}_1, \ldots, \mathcal{H}_k$ of Boolean functions with bounded Littlestone (respectively, threshold) dimension, we establish an upper bound on the Littlestone (respectively, threshold) dimension of the class defined by applying an arbitrary binary aggregation rule to $\mathcal{H}_1, \ldots, \mathcal{H}_k$. We also show that our upper bounds are nearly tight. Our upper bounds give an exponential (in $k$) improvement upon analogous bounds shown by Alon et al. (COLT 2020), thus answering a question posed by their work. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: 7 pages

arXiv:1908.11335 [pdf, ps, other]

Nearly Tight Bounds for Robust Proper Learning of Halfspaces with a Margin

Authors: Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi

Abstract: We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $α\cdot \mathrm{OPT}_γ + ε$, where $\mathrm{OPT}_γ$ is the optimal $γ$-margin error rate and $α\geq 1$ is the approximation ratio. We give learning algorithms and co… ▽ More We study the problem of {\em properly} learning large margin halfspaces in the agnostic PAC model. In more detail, we study the complexity of properly learning $d$-dimensional halfspaces on the unit ball within misclassification error $α\cdot \mathrm{OPT}_γ + ε$, where $\mathrm{OPT}_γ$ is the optimal $γ$-margin error rate and $α\geq 1$ is the approximation ratio. We give learning algorithms and computational hardness results for this problem, for all values of the approximation ratio $α\geq 1$, that are nearly-matching for a range of parameters. Specifically, for the natural setting that $α$ is any constant bigger than one, we provide an essentially tight complexity characterization. On the positive side, we give an $α= 1.01$-approximate proper learner that uses $O(1/(ε^2γ^2))$ samples (which is optimal) and runs in time $\mathrm{poly}(d/ε) \cdot 2^{\tilde{O}(1/γ^2)}$. On the negative side, we show that {\em any} constant factor approximate proper learner has runtime $\mathrm{poly}(d/ε) \cdot 2^{(1/γ)^{2-o(1)}}$, assuming the Exponential Time Hypothesis. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Showing 1–12 of 12 results for author: Manurangsi, P