Noise-Aware Statistical Inference with Differentially Private Synthetic Data

Räisä, Ossi; Jälkö, Joonas; Kaski, Samuel; Honkela, Antti

Statistics > Machine Learning

arXiv:2205.14485 (stat)

[Submitted on 28 May 2022 (v1), last revised 24 Feb 2023 (this version, v3)]

Title:Noise-Aware Statistical Inference with Differentially Private Synthetic Data

Authors:Ossi Räisä, Joonas Jälkö, Samuel Kaski, Antti Honkela

View PDF

Abstract:While generation of synthetic data under differential privacy (DP) has received a lot of attention in the data privacy community, analysis of synthetic data has received much less. Existing work has shown that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities. For example, confidence intervals become too narrow, which we demonstrate with a simple experiment. We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation (MI), and synthetic data generation using noise-aware (NA) Bayesian modeling into a pipeline NA+MI that allows computing accurate uncertainty estimates for population-level quantities from DP synthetic data. To implement NA+MI for discrete data generation using the values of marginal queries, we develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy. Our experiments demonstrate that the pipeline is able to produce accurate confidence intervals from DP synthetic data. The intervals become wider with tighter privacy to accurately capture the additional uncertainty stemming from DP noise.

Comments:	24 pages, 14 figures
Subjects:	Machine Learning (stat.ML); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2205.14485 [stat.ML]
	(or arXiv:2205.14485v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2205.14485

Submission history

From: Ossi Räisä [view email]
[v1] Sat, 28 May 2022 16:59:46 UTC (1,007 KB)
[v2] Thu, 17 Nov 2022 12:45:38 UTC (1,574 KB)
[v3] Fri, 24 Feb 2023 17:58:53 UTC (673 KB)

Statistics > Machine Learning

Title:Noise-Aware Statistical Inference with Differentially Private Synthetic Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Noise-Aware Statistical Inference with Differentially Private Synthetic Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators