Skip to main content

Showing 1–4 of 4 results for author: Mahiou, S

.
  1. arXiv:2506.00322  [pdf, ps, other

    cs.CR cs.AI cs.LG

    dpmm: Differentially Private Marginal Models, a Library for Synthetic Tabular Data Generation

    Authors: Sofiane Mahiou, Amir Dizche, Reza Nazari, Xinmin Wu, Ralph Abbey, Jorge Silva, Georgi Ganev

    Abstract: We propose dpmm, an open-source library for synthetic data generation with Differentially Private (DP) guarantees. It includes three popular marginal models -- PrivBayes, MST, and AIM -- that achieve superior utility and offer richer functionality compared to alternative implementations. Additionally, we adopt best practices to provide end-to-end DP guarantees and address well-known DP-related vul… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

    Comments: Accepted to the Theory and Practice of Differential Privacy Workshop (TPDP 2025)

  2. arXiv:2504.08254  [pdf, other

    cs.CR cs.LG

    Understanding the Impact of Data Domain Extraction on Synthetic Data Privacy

    Authors: Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro

    Abstract: Privacy attacks, particularly membership inference attacks (MIAs), are widely used to assess the privacy of generative models for tabular synthetic data, including those with Differential Privacy (DP) guarantees. These attacks often exploit outliers, which are especially vulnerable due to their position at the boundaries of the data domain (e.g., at the minimum and maximum values). However, the ro… ▽ More

    Submitted 13 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Accepted to the Synthetic Data x Data Access Problem workshop (SynthData), part of ICLR 2025

  3. arXiv:2504.06923  [pdf, other

    cs.CR cs.LG

    The Importance of Being Discrete: Measuring the Impact of Discretization in End-to-End Differentially Private Synthetic Data

    Authors: Georgi Ganev, Meenatchi Sundaram Muthu Selva Annamalai, Sofiane Mahiou, Emiliano De Cristofaro

    Abstract: Differentially Private (DP) generative marginal models are often used in the wild to release synthetic tabular datasets in lieu of sensitive data while providing formal privacy guarantees. These models approximate low-dimensional marginals or query workloads; crucially, they require the training data to be pre-discretized, i.e., continuous values need to first be partitioned into bins. However, as… ▽ More

    Submitted 13 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  4. arXiv:2207.05810  [pdf, other

    cs.LG cs.CR

    dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation

    Authors: Sofiane Mahiou, Kai Xu, Georgi Ganev

    Abstract: We propose a general, flexible, and scalable framework dpart, an open source Python library for differentially private synthetic data generation. Central to the approach is autoregressive modelling -- breaking the joint data distribution to a sequence of lower-dimensional conditional distributions, captured by various methods such as machine learning models (logistic/linear regression, decision tr… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted at the Theory and Practice of Differential Privacy (TPDP) 2022, part of ICML 2022