Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

Lin, Zinan; Baltrusaitis, Tadas; Wang, Wenyu; Yekhanin, Sergey

Computer Science > Machine Learning

arXiv:2502.05505 (cs)

[Submitted on 8 Feb 2025 (v1), last revised 20 May 2025 (this version, v3)]

Title:Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

Authors:Zinan Lin, Tadas Baltrusaitis, Wenyu Wang, Sergey Yekhanin

View PDF HTML (experimental)

Abstract:Differentially private (DP) synthetic data, which closely resembles the original private data while maintaining strong privacy guarantees, has become a key tool for unlocking the value of private data without compromising privacy. Recently, Private Evolution (PE) has emerged as a promising method for generating DP synthetic data. Unlike other training-based approaches, PE only requires access to inference APIs from foundation models, enabling it to harness the power of state-of-the-art (SoTA) models. However, a suitable foundation model for a specific private data domain is not always available. In this paper, we discover that the PE framework is sufficiently general to allow APIs beyond foundation models. In particular, we demonstrate that many SoTA data synthesizers that do not rely on neural networks--such as computer graphics-based image generators, which we refer to as simulators--can be effectively integrated into PE. This insight significantly broadens PE's applicability and unlocks the potential of powerful simulators for DP data synthesis. We explore this approach, named Sim-PE, in the context of image synthesis. Across four diverse simulators, Sim-PE performs well, improving the downstream classification accuracy of PE by up to 3x, reducing FID by up to 80%, and offering much greater efficiency. We also show that simulators and foundation models can be easily leveraged together within PE to achieve further improvements. The code is open-sourced in the Private Evolution Python library: this https URL.

Comments:	Published in: (1) ICLR 2025 Workshop on Data Problems, (2) ICLR 2025 Workshop on Synthetic Data
Subjects:	Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:2502.05505 [cs.LG]
	(or arXiv:2502.05505v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.05505

Submission history

From: Zinan Lin [view email]
[v1] Sat, 8 Feb 2025 09:50:30 UTC (411 KB)
[v2] Sat, 17 May 2025 16:34:54 UTC (950 KB)
[v3] Tue, 20 May 2025 04:05:24 UTC (950 KB)

Computer Science > Machine Learning

Title:Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators