Search | arXiv e-print repository

Privacy Amplification Through Synthetic Data: Insights from Linear Regression

Authors: Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

Abstract: Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is still lacking. In this paper, we investigate this question through the well-understood framework of linear… ▽ More Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is still lacking. In this paper, we investigate this question through the well-understood framework of linear regression. First, we establish negative results showing that if an adversary controls the seed of the generative model, a single synthetic data point can leak as much information as releasing the model itself. Conversely, we show that when synthetic data is generated from random inputs, releasing a limited number of synthetic data points amplifies privacy beyond the model's inherent guarantees. We believe our findings in linear regression can serve as a foundation for deriving more general bounds in the future. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: 26 pages, ICML 2025

arXiv:2402.15415 [pdf, other]

The Impact of LoRA on the Emergence of Clusters in Transformers

Authors: Hugo Koubbi, Matthieu Boussard, Louis Hernandez

Abstract: In this paper, we employ the mathematical framework on Transformers developed by \citet{sander2022sinkformers,geshkovski2023emergence,geshkovski2023mathematical} to explore how variations in attention parameters and initial token values impact the structural dynamics of token clusters. Our analysis demonstrates that while the clusters within a modified attention matrix dynamics can exhibit signifi… ▽ More In this paper, we employ the mathematical framework on Transformers developed by \citet{sander2022sinkformers,geshkovski2023emergence,geshkovski2023mathematical} to explore how variations in attention parameters and initial token values impact the structural dynamics of token clusters. Our analysis demonstrates that while the clusters within a modified attention matrix dynamics can exhibit significant divergence from the original over extended periods, they maintain close similarities over shorter intervals, depending on the parameter differences. This work contributes to the fine-tuning field through practical applications to the LoRA algorithm \cite{hu2021lora,peft}, enhancing our understanding of the behavior of LoRA-enhanced Transformer models. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2312.13985 [pdf, other]

Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration

Authors: Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard

Abstract: Pufferfish privacy is a flexible generalization of differential privacy that allows to model arbitrary secrets and adversary's prior knowledge about the data. Unfortunately, designing general and tractable Pufferfish mechanisms that do not compromise utility is challenging. Furthermore, this framework does not provide the composition guarantees needed for a direct use in iterative machine learning… ▽ More Pufferfish privacy is a flexible generalization of differential privacy that allows to model arbitrary secrets and adversary's prior knowledge about the data. Unfortunately, designing general and tractable Pufferfish mechanisms that do not compromise utility is challenging. Furthermore, this framework does not provide the composition guarantees needed for a direct use in iterative machine learning algorithms. To mitigate these issues, we introduce a Rényi divergence-based variant of Pufferfish and show that it allows us to extend the applicability of the Pufferfish framework. We first generalize the Wasserstein mechanism to cover a wide range of noise distributions and introduce several ways to improve its utility. We also derive stronger guarantees against out-of-distribution adversaries. Finally, as an alternative to composition, we prove privacy amplification results for contractive noisy iterations and showcase the first use of Pufferfish in private convex optimization. A common ingredient underlying our results is the use and extension of shift reduction lemmas. △ Less

Submitted 10 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.07139 [pdf, other]

Practical considerations on using private sampling for synthetic data

Authors: Clément Pierquin, Bastien Zimmermann, Matthieu Boussard

Abstract: Artificial intelligence and data access are already mainstream. One of the main challenges when designing an artificial intelligence or disclosing content from a database is preserving the privacy of individuals who participate in the process. Differential privacy for synthetic data generation has received much attention due to the ability of preserving privacy while freely using the synthetic dat… ▽ More Artificial intelligence and data access are already mainstream. One of the main challenges when designing an artificial intelligence or disclosing content from a database is preserving the privacy of individuals who participate in the process. Differential privacy for synthetic data generation has received much attention due to the ability of preserving privacy while freely using the synthetic data. Private sampling is the first noise-free method to construct differentially private synthetic data with rigorous bounds for privacy and accuracy. However, this synthetic data generation method comes with constraints which seem unrealistic and not applicable for real-world datasets. In this paper, we provide an implementation of the private sampling algorithm and discuss the realism of its constraints in practical cases. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Showing 1–4 of 4 results for author: Boussard, M