Skip to main content

Showing 1–6 of 6 results for author: Francis, B

Searching in archive stat. Search in all archives.
.
  1. arXiv:2408.02513  [pdf, other

    stat.ME

    The appeal of the gamma family distribution to protect the confidentiality of contingency tables

    Authors: James Jackson, Robin Mitra, Brian Francis, Iain Dove

    Abstract: Administrative databases, such as the English School Census (ESC), are rich sources of information that are potentially useful for researchers. For such data sources to be made available, however, strict guarantees of privacy would be required. To achieve this, synthetic data methods can be used. Such methods, when protecting the confidentiality of tabular data (contingency tables), often utilise… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  2. arXiv:2407.00417  [pdf, other

    cs.CR stat.ME

    Obtaining $(ε,δ)$-differential privacy guarantees when using a Poisson mechanism to synthesize contingency tables

    Authors: James Jackson, Robin Mitra, Brian Francis, Iain Dove

    Abstract: We show that differential privacy type guarantees can be obtained when using a Poisson synthesis mechanism to protect counts in contingency tables. Specifically, we show how to obtain $(ε, δ)$-probabilistic differential privacy guarantees via the Poisson distribution's cumulative distribution function. We demonstrate this empirically with the synthesis of an administrative-type confidential databa… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2205.05993  [pdf, other

    stat.ME

    On integrating the number of synthetic data sets $m$ into the 'a priori' synthesis approach

    Authors: James Edward Jackson, Robin Mitra, Brian Joseph Francis, Iain Dove

    Abstract: Until recently, multiple synthetic data sets were always released to analysts, to allow valid inferences to be obtained. However, under certain conditions - including when saturated count models are used to synthesize categorical data - single imputation ($m=1$) is sufficient. Nevertheless, increasing $m$ causes utility to improve, but at the expense of higher risk, an example of the risk-utility… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

  4. arXiv:2107.08062  [pdf, other

    stat.ME

    Using saturated count models for user-friendly synthesis of categorical data

    Authors: James Edward Jackson, Robin Mitra, Brian Joseph Francis, Iain Dove

    Abstract: Over the past three decades, synthetic data methods for statistical disclosure control have continually evolved, but mainly within the domain of survey data sets. There are certain characteristics of administrative databases, such as their size, which present challenges from a synthesis perspective and require special attention. This paper, through the fitting of saturated count models, presents a… ▽ More

    Submitted 12 May, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 37 pages, 6 figures

  5. arXiv:1805.12052  [pdf, other

    cond-mat.stat-mech nlin.CD physics.data-an stat.OT

    Unwinding the model manifold: choosing similarity measures to remove local minima in sloppy dynamical systems

    Authors: Benjamin L. Francis, Mark K. Transtrum

    Abstract: In this paper, we consider the problem of parameter sensitivity in models of complex dynamical systems through the lens of information geometry. We calculate the sensitivity of model behavior to variations in parameters. In most cases, models are sloppy, that is, exhibit an exponential hierarchy of parameter sensitivities. We propose a parameter classification scheme based on how the sensitivities… ▽ More

    Submitted 22 February, 2019; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: 16 pages, 14 figures, supplementary material merged with main article

    Journal ref: Phys. Rev. E 100, 012206 (2019)

  6. Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge?

    Authors: Brian Francis, Regina Dittrich, Reinhold Hatzinger

    Abstract: This paper is motivated by a Eurobarometer survey on science knowledge. As part of the survey, respondents were asked to rank sources of science information in order of importance. The official statistical analysis of these data however failed to use the complete ranking information. We instead propose a method which treats ranked data as a set of paired comparisons which places the problem in the… ▽ More

    Submitted 7 January, 2011; originally announced January 2011.

    Comments: Published in at http://dx.doi.org/10.1214/10-AOAS366 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS366

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 4, 2181-2202