-
A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census
Authors:
John M. Abowd,
Tamara Adams,
Robert Ashmead,
David Darais,
Sourya Dey,
Simson L. Garfinkel,
Nathan Goldschlag,
Michael B. Hawes,
Daniel Kifer,
Philip Leclerc,
Ethan Lew,
Scott Moore,
Rolando A. Rodríguez,
Ramy N. Tadros,
Lars Vilhuber
Abstract:
We show that individual, confidential microdata records from the 2010 U.S. Census of Population and Housing can be accurately reconstructed from the published tabular summaries. Ninety-seven million person records (every resident in 70% of all census blocks) are exactly reconstructed with provable certainty using only public information. We further show that a hypothetical attacker using our metho…
▽ More
We show that individual, confidential microdata records from the 2010 U.S. Census of Population and Housing can be accurately reconstructed from the published tabular summaries. Ninety-seven million person records (every resident in 70% of all census blocks) are exactly reconstructed with provable certainty using only public information. We further show that a hypothetical attacker using our methods can reidentify with 95% accuracy population unique individuals who are perfectly reconstructed and not in the modal race and ethnicity category in their census block (3.4 million persons)--a result that is only possible because their confidential records were used in the published tabulations. Finally, we show that the methods used for the 2020 Census, based on a differential privacy framework, provide better protection against this type of attack, with better published data accuracy, than feasible alternatives.
△ Less
Submitted 28 July, 2025; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Randomness Concerns When Deploying Differential Privacy
Authors:
Simson L. Garfinkel,
Philip Leclerc
Abstract:
The U.S. Census Bureau is using differential privacy (DP) to protect confidential respondent data collected for the 2020 Decennial Census of Population & Housing. The Census Bureau's DP system is implemented in the Disclosure Avoidance System (DAS) and requires a source of random numbers. We estimate that the 2020 Census will require roughly 90TB of random bytes to protect the person and household…
▽ More
The U.S. Census Bureau is using differential privacy (DP) to protect confidential respondent data collected for the 2020 Decennial Census of Population & Housing. The Census Bureau's DP system is implemented in the Disclosure Avoidance System (DAS) and requires a source of random numbers. We estimate that the 2020 Census will require roughly 90TB of random bytes to protect the person and household tables. Although there are critical differences between cryptography and DP, they have similar requirements for randomness. We review the history of random number generation on deterministic computers, including von Neumann's "middle-square" method, Mersenne Twister (MT19937) (previously the default NumPy random number generator, which we conclude is unacceptable for use in production privacy-preserving systems), and the Linux /dev/urandom device. We also review hardware random number generator schemes, including the use of so-called "Lava Lamps" and the Intel Secure Key RDRAND instruction. We finally present our plan for generating random bits in the Amazon Web Services (AWS) environment using AES-CTR-DRBG seeded by mixing bits from /dev/urandom and the Intel Secure Key RDSEED instruction, a compromise of our desire to rely on a trusted hardware implementation, the unease of our external reviewers in trusting a hardware-only implementation, and the need to generate so many random bits.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
A File System For Write-Once Media
Authors:
Simson L. Garfinkel,
J. Spencer Love
Abstract:
A file system standard for use with write-once media such as digital compact disks is proposed. The file system is designed to work with any operating system and a variety of physical media. Although the implementation is simple, it provides a a full-featured and high-performance alternative to conventional file systems on traditional, multiple-write media such as magnetic disks.
A file system standard for use with write-once media such as digital compact disks is proposed. The file system is designed to work with any operating system and a variety of physical media. Although the implementation is simple, it provides a a full-featured and high-performance alternative to conventional file systems on traditional, multiple-write media such as magnetic disks.
△ Less
Submitted 30 March, 2020;
originally announced April 2020.
-
Issues Encountered Deploying Differential Privacy
Authors:
Simson L. Garfinkel,
John M. Abowd,
Sarah Powazek
Abstract:
When differential privacy was created more than a decade ago, the motivating example was statistics published by an official statistics agency. In attempting to transition differential privacy from the academy to practice, the U.S. Census Bureau has encountered many challenges unanticipated by differential privacy's creators. These challenges include obtaining qualified personnel and a suitable co…
▽ More
When differential privacy was created more than a decade ago, the motivating example was statistics published by an official statistics agency. In attempting to transition differential privacy from the academy to practice, the U.S. Census Bureau has encountered many challenges unanticipated by differential privacy's creators. These challenges include obtaining qualified personnel and a suitable computing environment, the difficulty accounting for all uses of the confidential data, the lack of release mechanisms that align with the needs of data users, the expectation on the part of data users that they will have access to micro-data, and the difficulty in setting the value of the privacy-loss parameter, $ε$ (epsilon), and the lack of tools and trained individuals to verify the correctness of differential privacy implementations.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.