Skip to main content

Showing 1–6 of 6 results for author: Raab, G M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.14211  [pdf

    stat.AP

    Four checks for low-fidelity synthetic data: recommendations for disclosure control and quality evaluation

    Authors: Gillian M Raab, Sophie McCall, Liam Cavin

    Abstract: Confidential administrative data is usually only available to researchers within a trusted research environment (TRE). Recently, some UK groups have proposed that low-fidelity synthetic data (LFSD) is available to researchers outside the TRE to allow code-testing and data discovery. There is a need for transparency so that those who access LFSD know how it has been created and what to expect from… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  2. arXiv:2409.04257  [pdf, other

    stat.AP

    Privacy risk from synthetic data: practical proposals

    Authors: Gillian M Raab

    Abstract: This paper proposes and compares measures of identity and attribute disclosure risk for synthetic data. Data custodians can use the methods proposed here to inform the decision as to whether to release synthetic versions of confidential data. Different measures are evaluated on two data sets. Insight into the measures is obtained by examining the details of the records identified as posing a discl… ▽ More

    Submitted 16 May, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

  3. arXiv:2406.16826  [pdf, other

    stat.AP

    Practical privacy metrics for synthetic data

    Authors: Gillian M Raab, Beata Nowok, Chris Dibben

    Abstract: This paper explains how the synthpop package for R has been extended to include functions to calculate measures of identity and attribute disclosure risk for synthetic data that measure risks for the records used to create the synthetic data. The basic function, disclosure, calculates identity disclosure for a set of quasi-identifiers (keys) and attribute disclosure for one variable specified as a… ▽ More

    Submitted 14 May, 2025; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 12 pages, 2 figures, plus 7 more pages with references ands appendices. A;so appears as a vignette for the synthpop package for R

  4. arXiv:2206.01362  [pdf, other

    stat.AP

    Utility and Disclosure Risk for Differentially Private Synthetic Categorical Data

    Authors: Gillian M Raab

    Abstract: This paper introduces two methods of creating differentially private (DP) synthetic data that are now incorporated into the \textit{synthpop} package for \textbf{R}. Both are suitable for synthesising categorical data, or numeric data grouped into categories. Ten data sets with varying characteristics were used to evaluate the methods. Measures of disclosiveness and of utility were defined and cal… ▽ More

    Submitted 26 June, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  5. arXiv:2109.12717  [pdf, other

    stat.CO

    Assessing, visualizing and improving the utility of synthetic data

    Authors: Gillian M Raab, Beata Nowok, Chris Dibben

    Abstract: The synthpop package for R https://www.synthpop.org.uk provides tools to allow data custodians to create synthetic versions of confidential microdata that can be distributed with fewer restrictions than the original. The synthesis can be customized to ensure that relationships evident in the real data are reproduced in the synthetic data. A number of measures have been proposed to assess this aspe… ▽ More

    Submitted 13 November, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

    Comments: main text and references 13. Four appendices on pages 14-19. Four figures

  6. arXiv:1712.04078  [pdf, other

    stat.AP

    Guidelines for Producing Useful Synthetic Data

    Authors: Gillian M. Raab, Beata Nowok, Chris Dibben

    Abstract: We report on our experiences of helping staff of the Scottish Longitudinal Study to create synthetic extracts that can be released to users. In particular, we focus on how the synthesis process can be tailored to produce synthetic extracts that will provide users with similar results to those that would be obtained from the original data. We make recommendations for synthesis methods and illustrat… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.