-
Gaussian Differential Private Bootstrap by Subsampling
Authors:
Holger Dette,
Carina Graw
Abstract:
Bootstrap is a common tool for quantifying uncertainty in data analysis. However, besides additional computational costs in the application of the bootstrap on massive data, a challenging problem in bootstrap based inference under Differential Privacy consists in the fact that it requires repeated access to the data. As a consequence, bootstrap based differentially private inference requires a sig…
▽ More
Bootstrap is a common tool for quantifying uncertainty in data analysis. However, besides additional computational costs in the application of the bootstrap on massive data, a challenging problem in bootstrap based inference under Differential Privacy consists in the fact that it requires repeated access to the data. As a consequence, bootstrap based differentially private inference requires a significant increase of the privacy budget, which on the other hand comes with a substantial loss in statistical accuracy.
A potential solution to reconcile the conflicting goals of statistical accuracy and privacy is to analyze the data under parametric model assumptions and in the last decade, several parametric bootstrap methods for inference under privacy have been investigated. However, uncertainty quantification by parametric bootstrap is only valid if the the quantities of interest can be identified as the parameters of a statistical model and the imposed model assumptions are (at least approximately) satisfied. An alternative to parametric methods is the empirical bootstrap that is a widely used tool for non-parametric inference and well studied in the non-private regime. However, under privacy, less insight is available. In this paper, we propose a private empirical $m$ out of $n$ bootstrap and validate its consistency and privacy guarantees under Gaussian Differential Privacy. Compared to the the private $n$ out of $n$ bootstrap, our approach has several advantages. First, it comes with less computational costs, in particular for massive data. Second, the proposed procedure needs less additional noise in the bootstrap iterations, which leads to an improved statistical accuracy while asymptotically guaranteeing the same level of privacy. Third, we demonstrate much better finite sample properties compared to the currently available procedures.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
SILENT: A New Lens on Statistics in Software Timing Side Channels
Authors:
Martin Dunsche,
Patrick Bastian,
Marcel Maehren,
Nurullah Erinola,
Robert Merget,
Nicolai Bissantz,
Holger Dette,
Jörg Schwenk
Abstract:
Cryptographic research takes software timing side channels seriously. Approaches to mitigate them include constant-time coding and techniques to enforce such practices. However, recent attacks like Meltdown [42], Spectre [37], and Hertzbleed [70] have challenged our understanding of what it means for code to execute in constant time on modern CPUs. To ensure that assumptions on the underlying hard…
▽ More
Cryptographic research takes software timing side channels seriously. Approaches to mitigate them include constant-time coding and techniques to enforce such practices. However, recent attacks like Meltdown [42], Spectre [37], and Hertzbleed [70] have challenged our understanding of what it means for code to execute in constant time on modern CPUs. To ensure that assumptions on the underlying hardware are correct and to create a complete feedback loop, developers should also perform \emph{timing measurements} as a final validation step to ensure the absence of exploitable side channels. Unfortunately, as highlighted by a recent study by Jancar et al. [30], developers often avoid measurements due to the perceived unreliability of the statistical analysis and its guarantees.
In this work, we combat the view that statistical techniques only provide weak guarantees by introducing a new algorithm for the analysis of timing measurements with strong, formal statistical guarantees, giving developers a reliable analysis tool. Specifically, our algorithm (1) is non-parametric, making minimal assumptions about the underlying distribution and thus overcoming limitations of classical tests like the t-test, (2) handles unknown data dependencies in measurements, (3) can estimate in advance how many samples are needed to detect a leak of a given size, and (4) allows the definition of a negligible leak threshold $Δ$, ensuring that acceptable non-exploitable leaks do not trigger false positives, without compromising statistical soundness. We demonstrate the necessity, effectiveness, and benefits of our approach on both synthetic benchmarks and real-world applications.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
General-Purpose $f$-DP Estimation and Auditing in a Black-Box Setting
Authors:
Önder Askin,
Holger Dette,
Martin Dunsche,
Tim Kutta,
Yun Lu,
Yu Wei,
Vassilis Zikas
Abstract:
In this paper we propose new methods to statistically assess $f$-Differential Privacy ($f$-DP), a recent refinement of differential privacy (DP) that remedies certain weaknesses of standard DP (including tightness under algorithmic composition). A challenge when deploying differentially private mechanisms is that DP is hard to validate, especially in the black-box setting. This has led to numerous…
▽ More
In this paper we propose new methods to statistically assess $f$-Differential Privacy ($f$-DP), a recent refinement of differential privacy (DP) that remedies certain weaknesses of standard DP (including tightness under algorithmic composition). A challenge when deploying differentially private mechanisms is that DP is hard to validate, especially in the black-box setting. This has led to numerous empirical methods for auditing standard DP, while $f$-DP remains less explored. We introduce new black-box methods for $f$-DP that, unlike existing approaches for this privacy notion, do not require prior knowledge of the investigated algorithm. Our procedure yields a complete estimate of the $f$-DP trade-off curve, with theoretical guarantees of convergence. Additionally, we propose an efficient auditing method that empirically detects $f$-DP violations with statistical certainty, merging techniques from non-parametric estimation and optimal classification theory. Through experiments on a range of DP mechanisms, we demonstrate the effectiveness of our estimation and auditing procedures.
△ Less
Submitted 13 June, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent
Authors:
Holger Dette,
Carina Graw
Abstract:
Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferre…
▽ More
Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferred to differential privacy due to multiple queries to the private data. In this paper, we propose a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget. The method can be easily implemented and is applicable to a broad class of estimation problems. We prove the validity of our approach and illustrate its finite sample properties by means of a simulation study. As a by-product, the new method also provides a simple alternative numerical tool for UQ for non-private SGD.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Multivariate Mean Comparison under Differential Privacy
Authors:
Martin Dunsche,
Tim Kutta,
Holger Dette
Abstract:
The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals' privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this pr…
▽ More
The comparison of multivariate population means is a central task of statistical inference. While statistical theory provides a variety of analysis tools, they usually do not protect individuals' privacy. This knowledge can create incentives for participants in a study to conceal their true data (especially for outliers), which might result in a distorted analysis. In this paper we address this problem by developing a hypothesis test for multivariate mean comparisons that guarantees differential privacy to users. The test statistic is based on the popular Hotelling's $t^2$-statistic, which has a natural interpretation in terms of the Mahalanobis distance. In order to control the type-1-error, we present a bootstrap algorithm under differential privacy that provably yields a reliable test decision. In an empirical study we demonstrate the applicability of this approach.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Statistical Quantification of Differential Privacy: A Local Approach
Authors:
Önder Askin,
Tim Kutta,
Holger Dette
Abstract:
In this work, we introduce a new approach for statistical quantification of differential privacy in a black box setting. We present estimators and confidence intervals for the optimal privacy parameter of a randomized algorithm $A$, as well as other key variables (such as the "data-centric privacy level"). Our estimators are based on a local characterization of privacy and in contrast to the relat…
▽ More
In this work, we introduce a new approach for statistical quantification of differential privacy in a black box setting. We present estimators and confidence intervals for the optimal privacy parameter of a randomized algorithm $A$, as well as other key variables (such as the "data-centric privacy level"). Our estimators are based on a local characterization of privacy and in contrast to the related literature avoid the process of "event selection" - a major obstacle to privacy validation. This makes our methods easy to implement and user-friendly. We show fast convergence rates of the estimators and asymptotic validity of the confidence intervals. An experimental study of various algorithms confirms the efficacy of our approach.
△ Less
Submitted 2 May, 2022; v1 submitted 21 August, 2021;
originally announced August 2021.