To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

Annamalai, Meenatchi Sundaram Muthu Selva; Balle, Borja; Hayes, Jamie; De Cristofaro, Emiliano

Computer Science > Cryptography and Security

arXiv:2411.10614 (cs)

[Submitted on 15 Nov 2024 (v1), last revised 12 Apr 2025 (this version, v2)]

Title:To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

Authors:Meenatchi Sundaram Muthu Selva Annamalai, Borja Balle, Jamie Hayes, Emiliano De Cristofaro

View PDF HTML (experimental)

Abstract:The Differentially Private Stochastic Gradient Descent (DP-SGD) algorithm allows the training of machine learning (ML) models with formal Differential Privacy (DP) guarantees. Since DP-SGD processes training data in batches, it employs Poisson sub-sampling to select each batch at every step. However, it has become common practice to replace sub-sampling with shuffling owing to better compatibility and computational overhead. At the same time, we do not know how to compute tight theoretical guarantees for shuffling; thus, DP guarantees of models privately trained with shuffling are often reported as though Poisson sub-sampling was used.
This prompts the need to verify whether gaps exist between the theoretical DP guarantees reported by state-of-the-art models and their actual leakage. To do so, we introduce a novel DP auditing procedure to analyze DP-SGD with shuffling and show that DP models trained with this approach have considerably overestimated privacy guarantees (up to 4 times). In the process, we assess the impact on privacy leakage of several parameters, including batch size, privacy budget, and threat model. Finally, we study two common variations of the shuffling procedure that result in even further privacy leakage (up to 10 times). Overall, our work attests to the risk of using shuffling instead of Poisson sub-sampling vis-à-vis privacy leakage from DP-SGD.

Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2411.10614 [cs.CR]
	(or arXiv:2411.10614v2 [cs.CR] for this version)
	https://doi.org/10.48550/arXiv.2411.10614

Submission history

From: Emiliano De Cristofaro [view email]
[v1] Fri, 15 Nov 2024 22:34:28 UTC (14,677 KB)
[v2] Sat, 12 Apr 2025 15:32:37 UTC (14,679 KB)

Computer Science > Cryptography and Security

Title:To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Cryptography and Security

Title:To Shuffle or not to Shuffle: Auditing DP-SGD with Shuffling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators