Search | arXiv e-print repository

doi 10.1162/99608f92.db29c137

Toward a Principled Framework for Disclosure Avoidance

Authors: Michael B Hawes, Evan M Brassell, Anthony Caruso, Ryan Cumings-Menon, Jason Devine, Cassandra Dorius, David Evans, Kenneth Haase, Michele C Hedrick, Alexandra Krause, Philip Leclerc, James Livsey, Rolando A Rodriguez, Luke T Rogers, Matthew Spence, Victoria Velkoff, Michael Walsh, James Whitehorne, Sallie Ann Keller

Abstract: Responsible disclosure limitation is an iterative exercise in risk assessment and mitigation. From time to time, as disclosure risks grow and evolve and as data users' needs change, agencies must consider redesigning the disclosure avoidance system(s) they use. Discussions about candidate systems often conflate inherent features of those systems with implementation decisions independent of those s… ▽ More Responsible disclosure limitation is an iterative exercise in risk assessment and mitigation. From time to time, as disclosure risks grow and evolve and as data users' needs change, agencies must consider redesigning the disclosure avoidance system(s) they use. Discussions about candidate systems often conflate inherent features of those systems with implementation decisions independent of those systems. For example, a system's ability to calibrate the strength of protection to suit the underlying disclosure risk of the data (e.g., by varying suppression thresholds), is a worthwhile feature regardless of the independent decision about how much protection is actually necessary. Having a principled discussion of candidate disclosure avoidance systems requires a framework for distinguishing these inherent features of the systems from the implementation decisions that need to be made independent of the system selected. For statistical agencies, this framework must also reflect the applied nature of these systems, acknowledging that candidate systems need to be adaptable to requirements stemming from the legal, scientific, resource, and stakeholder environments within which they would be operating. This paper proposes such a framework. No approach will be perfectly adaptable to every potential system requirement. Because the selection of some methodologies over others may constrain the resulting systems' efficiency and flexibility to adapt to particular statistical product specifications, data user needs, or disclosure risks, agencies may approach these choices in an iterative fashion, adapting system requirements, product specifications, and implementation parameters as necessary to ensure the resulting quality of the statistical product. △ Less

Submitted 22 August, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Journal ref: Harvard Data Science Review, Special Issue 6 (2025)

arXiv:2312.11283 [pdf, ps, other]

doi 10.1162/99608f92.4a1ebf70

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census

Authors: John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Michael B. Hawes, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. Rodríguez, Ramy N. Tadros, Lars Vilhuber

Abstract: We show that individual, confidential microdata records from the 2010 U.S. Census of Population and Housing can be accurately reconstructed from the published tabular summaries. Ninety-seven million person records (every resident in 70% of all census blocks) are exactly reconstructed with provable certainty using only public information. We further show that a hypothetical attacker using our metho… ▽ More We show that individual, confidential microdata records from the 2010 U.S. Census of Population and Housing can be accurately reconstructed from the published tabular summaries. Ninety-seven million person records (every resident in 70% of all census blocks) are exactly reconstructed with provable certainty using only public information. We further show that a hypothetical attacker using our methods can reidentify with 95% accuracy population unique individuals who are perfectly reconstructed and not in the modal race and ethnicity category in their census block (3.4 million persons)--a result that is only possible because their confidential records were used in the published tabulations. Finally, we show that the methods used for the 2020 Census, based on a differential privacy framework, provide better protection against this type of attack, with better published data accuracy, than feasible alternatives. △ Less

Submitted 28 July, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: This is the accepted Harvard Data Science Review (2025) paper. The accepted supplemental text is here: https://arxiv.boxedpaper.com/abs/2312.11283v2

arXiv:2310.09398 [pdf, other]

doi 10.1073/pnas.2220558120

An In-Depth Examination of Requirements for Disclosure Risk Assessment

Authors: Ron S. Jarmin, John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Nathan Goldschlag, Michael B. Hawes, Sallie Ann Keller, Daniel Kifer, Philip Leclerc, Jerome P. Reiter, Rolando A. Rodríguez, Ian Schmutte, Victoria A. Velkoff, Pavel Zhuravlev

Abstract: The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be bas… ▽ More The use of formal privacy to protect the confidentiality of responses in the 2020 Decennial Census of Population and Housing has triggered renewed interest and debate over how to measure the disclosure risks and societal benefits of the published data products. Following long-established precedent in economics and statistics, we argue that any proposal for quantifying disclosure risk should be based on pre-specified, objective criteria. Such criteria should be used to compare methodologies to identify those with the most desirable properties. We illustrate this approach, using simple desiderata, to evaluate the absolute disclosure risk framework, the counterfactual framework underlying differential privacy, and prior-to-posterior comparisons. We conclude that satisfying all the desiderata is impossible, but counterfactual comparisons satisfy the most while absolute disclosure risk satisfies the fewest. Furthermore, we explain that many of the criticisms levied against differential privacy would be levied against any technology that is not equivalent to direct, unrestricted access to confidential data. Thus, more research is needed, but in the near-term, the counterfactual approach appears best-suited for privacy-utility analysis. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: 47 pages, 1 table

Journal ref: PNAS, October 13, 2023, Vol. 120, No. 43

arXiv:2303.00845 [pdf, ps, other]

$21^{st}$ Century Statistical Disclosure Limitation: Motivations and Challenges

Authors: John M Abowd, Michael B Hawes

Abstract: This chapter examines the motivations and imperatives for modernizing how statistical agencies approach statistical disclosure limitation for official data product releases. It discusses the implications for agencies' broader data governance and decision-making, and it identifies challenges that agencies will likely face along the way. In conclusion, the chapter proposes some principles and best p… ▽ More This chapter examines the motivations and imperatives for modernizing how statistical agencies approach statistical disclosure limitation for official data product releases. It discusses the implications for agencies' broader data governance and decision-making, and it identifies challenges that agencies will likely face along the way. In conclusion, the chapter proposes some principles and best practices that we believe can help guide agencies in navigating the transformation of their confidentiality programs. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: Forthcoming CRC Handbook of Formally Private and Synthetic Data Approaches for Statistical Disclosure Control

arXiv:2206.03524 [pdf, ps, other]

doi 10.1146/annurev-statistics-010422-034226

Confidentiality Protection in the 2020 US Census of Population and Housing

Authors: John M Abowd, Michael B Hawes

Abstract: In an era where external data and computational capabilities far exceed statistical agencies' own resources and capabilities, they face the renewed challenge of protecting the confidentiality of underlying microdata when publishing statistics in very granular form and ensuring that these granular data are used for statistical purposes only. Conventional statistical disclosure limitation methods ar… ▽ More In an era where external data and computational capabilities far exceed statistical agencies' own resources and capabilities, they face the renewed challenge of protecting the confidentiality of underlying microdata when publishing statistics in very granular form and ensuring that these granular data are used for statistical purposes only. Conventional statistical disclosure limitation methods are too fragile to address this new challenge. This article discusses the deployment of a differential privacy framework for the 2020 US Census that was customized to protect confidentiality, particularly the most detailed geographic and demographic categories, and deliver controlled accuracy across the full geographic hierarchy. △ Less

Submitted 27 December, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: Version 2 corrects a few transcription errors in Tables 2, 3 and 5. Version 3 adds final journal copy edits to the preprint

Journal ref: Annual Review of Statistics and Its Application 2023 10:1

arXiv:1702.03950 [pdf, ps, other]

doi 10.1109/TAP.2017.2655013

Bayesian Compressive Sensing Approaches for Direction of Arrival Estimation with Mutual Coupling Effects

Authors: Matthew Hawes, Lyudmila Mihaylova, François Septier, Simon Godsill

Abstract: The problem of estimating the dynamic direction of arrival of far field signals impinging on a uniform linear array, with mutual coupling effects, is addressed. This work proposes two novel approaches able to provide accurate solutions, including at the endfire regions of the array. Firstly, a Bayesian compressive sensing Kalman filter is developed, which accounts for the predicted estimated signa… ▽ More The problem of estimating the dynamic direction of arrival of far field signals impinging on a uniform linear array, with mutual coupling effects, is addressed. This work proposes two novel approaches able to provide accurate solutions, including at the endfire regions of the array. Firstly, a Bayesian compressive sensing Kalman filter is developed, which accounts for the predicted estimated signals rather than using the traditional sparse prior. The posterior probability density function of the received source signals and the expression for the related marginal likelihood function are derived theoretically. Next, a Gibbs sampling based approach with indicator variables in the sparsity prior is developed. This allows sparsity to be explicitly enforced in different ways, including when an angle is too far from the previous estimate. The proposed approaches are validated and evaluated over different test scenarios and compared to the traditional relevance vector machine based method. An improved accuracy in terms of average root mean square error values is achieved (up to 73.39% for the modified relevance vector machine based approach and 86.36% for the Gibbs sampling based approach). The proposed approaches prove to be particularly useful for direction of arrival estimation when the angle of arrival moves into the endfire region of the array. △ Less

Submitted 13 February, 2017; originally announced February 2017.

Comments: This paper is published in IEEE Transaction on Antenna and Propagation. If citing this work please use the information for the published version

arXiv:1702.00248 [pdf, ps, other]

doi 10.1109/TSP.2017.2655479

Location and Orientation Optimisation for Spatially Stretched Tripole Arrays Based on Compressive Sensing

Authors: Matthew Hawes, Lyudmila Mihaylova, Wei Liu

Abstract: The design of sparse spatially stretched tripole arrays is an important but also challenging task and this paper proposes for the very first time efficient solutions to this problem. Unlike for the design of traditional sparse antenna arrays, the developed approaches optimise both the dipole locations and orientations. The novelty of the paper consists in formulating these optimisation problems in… ▽ More The design of sparse spatially stretched tripole arrays is an important but also challenging task and this paper proposes for the very first time efficient solutions to this problem. Unlike for the design of traditional sparse antenna arrays, the developed approaches optimise both the dipole locations and orientations. The novelty of the paper consists in formulating these optimisation problems into a form that can be solved by the proposed compressive sensing and Bayesian compressive sensing based approaches. The performance of the developed approaches is validated and it is shown that accurate approximation of a reference response can be achieved with a 67% reduction in the number of dipoles required as compared to an equivalent uniform spatially stretched tripole array, leading to a significant reduction in the cost associated with the resulting arrays. △ Less

Submitted 1 February, 2017; originally announced February 2017.

Comments: This is an extended version of a paper published in IEEE Transactions of Signal Processing. If citing this work please use the information for the published version

arXiv:1603.08817 [pdf, other]

Compressive Sensing Based Design of Sparse Tripole Arrays

Authors: Matthew Hawes, Wei Liu, Lyudmila Mihaylova

Abstract: This paper considers the problem of designing sparse linear tripole arrays. In such arrays at each antenna location there are three orthogonal dipoles, allowing full measurement of both the horizontal and vertical components of the received waveform. We formulate this problem from the viewpoint of Compressive Sensing (CS). However, unlike for isotropic array elements (single antenna), we now have… ▽ More This paper considers the problem of designing sparse linear tripole arrays. In such arrays at each antenna location there are three orthogonal dipoles, allowing full measurement of both the horizontal and vertical components of the received waveform. We formulate this problem from the viewpoint of Compressive Sensing (CS). However, unlike for isotropic array elements (single antenna), we now have three complex valued weight coefficients associated with each potential location (due to the three dipoles), which have to be simultaneously minimised. If this is not done, we may only set the weight coefficients of individual dipoles to be zero valued, rather than complete tripoles, meaning some dipoles may remain at each location. Therefore, the contributions of this paper are to formulate the design of sparse tripole arrays as an optimisation problem, and then we obtain a solution based on the minimisation of a modified l1 norm or a series of iteratively solved reweighted minimisations, which ensure a truly sparse solution. Design examples are provided to verify the effectiveness of the proposed methods and show that a good approximation of a reference pattern can be achieved using fewer tripoles than a Uniform Linear Array (ULA) of equivalent length. △ Less

Submitted 29 March, 2016; originally announced March 2016.

Journal ref: Sensors 2015, 15, 31056-31068

arXiv:1509.06290 [pdf, ps, other]

A Bayesian Compressed Sensing Kalman Filter for Direction of Arrival Estimation

Authors: Matthew Hawes, Lyudmila Mihaylova, Francois Septier, Simon Godsill

Abstract: In this paper, we look to address the problem of estimating the dynamic direction of arrival (DOA) of a narrowband signal impinging on a sensor array from the far field. The initial estimate is made using a Bayesian compressive sensing (BCS) framework and then tracked using a Bayesian compressed sensing Kalman filter (BCSKF). The BCS framework splits the angular region into N potential DOAs and en… ▽ More In this paper, we look to address the problem of estimating the dynamic direction of arrival (DOA) of a narrowband signal impinging on a sensor array from the far field. The initial estimate is made using a Bayesian compressive sensing (BCS) framework and then tracked using a Bayesian compressed sensing Kalman filter (BCSKF). The BCS framework splits the angular region into N potential DOAs and enforces a belief that only a few of the DOAs will have a non-zero valued signal present. A BCSKF can then be used to track the change in the DOA using the same framework. There can be an issue when the DOA approaches the endfire of the array. In this angular region current methods can struggle to accurately estimate and track changes in the DOAs. To tackle this problem, we propose changing the traditional sparse belief associated with BCS to a belief that the estimated signals will match the predicted signals given a known DOA change. This is done by modelling the difference between the expected sparse received signals and the estimated sparse received signals as a Gaussian distribution. Example test scenarios are provided and comparisons made with the traditional BCS based estimation method. They show that an improvement in estimation accuracy is possible without a significant increase in computational complexity. △ Less

Submitted 21 September, 2015; originally announced September 2015.

Comments: Fusion 2015 paper

arXiv:1403.4879 [pdf, ps, other]

A Compressive Sensing Based Approach to Sparse Wideband Array Design

Authors: Matthew B. Hawes, Wei Liu

Abstract: Sparse wideband sensor array design for sensor location optimisation is highly nonlinear and it is traditionally solved by genetic algorithms, simulated annealing or other similar optimization methods. However, this is an extremely time-consuming process and more efficient solutions are needed. In this work, this problem is studied from the viewpoint of compressive sensing and a formulation based… ▽ More Sparse wideband sensor array design for sensor location optimisation is highly nonlinear and it is traditionally solved by genetic algorithms, simulated annealing or other similar optimization methods. However, this is an extremely time-consuming process and more efficient solutions are needed. In this work, this problem is studied from the viewpoint of compressive sensing and a formulation based on a modified $l_1$ norm is derived. As there are multiple coefficients associated with each sensor, the key is to make sure that these coefficients are simultaneously minimized in order to discard the corresponding sensor locations. Design examples are provided to verify the effectiveness of the proposed methods. △ Less

Submitted 19 March, 2014; originally announced March 2014.

Comments: 4 pages and 3 figures

Showing 1–10 of 10 results for author: Hawes, M