Search | arXiv e-print repository

Fast Conservative Monte Carlo Confidence Intervals

Authors: Amanda K. Glazer, Philip B. Stark

Abstract: Extant "fast" algorithms for Monte Carlo confidence sets are limited to univariate shift parameters for the one-sample and two-sample problems using the sample mean as the test statistic; moreover, some do not converge reliably and most do not produce conservative confidence sets. We outline general methods for constructing confidence sets for real-valued and multidimensional parameters by inverti… ▽ More Extant "fast" algorithms for Monte Carlo confidence sets are limited to univariate shift parameters for the one-sample and two-sample problems using the sample mean as the test statistic; moreover, some do not converge reliably and most do not produce conservative confidence sets. We outline general methods for constructing confidence sets for real-valued and multidimensional parameters by inverting Monte Carlo tests using any test statistic and a broad range of randomization schemes. The method exploits two facts that, to our knowledge, had not been combined: (i) there are Monte Carlo tests that are conservative despite relying on simulation, and (ii) since the coverage probability of confidence sets depends only on the significance level of the test of the true null, every null can be tested using the same Monte Carlo sample. The Monte Carlo sample can be arbitrarily small, although the highest nontrivial attainable confidence level generally increases as the number $N$ of Monte Carlo replicates increases. We present open-source Python and R implementations of new algorithms to compute conservative confidence sets for real-valued parameters from Monte Carlo tests, for test statistics and randomization schemes that yield $P$-values that are monotone or weakly unimodal in the parameter, with the data and Monte Carlo sample held fixed. In this case, the new method finds conservative confidence sets for real-valued parameters in $O(n)$ time, where $n$ is the number of data. The values of some test statistics for different simulations and parameter values have a simple relationship that makes more savings possible. △ Less

Submitted 25 February, 2025; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2309.09081 [pdf, other]

Stylish Risk-Limiting Audits in Practice

Authors: Amanda K. Glazer, Jacob V. Spertus, Philip B. Stark

Abstract: Risk-limiting audits (RLAs) can use information about which ballot cards contain which contests (card-style data, CSD) to ensure that each contest receives adequate scrutiny, without examining more cards than necessary. RLAs using CSD in this way can be substantially more efficient than RLAs that sample indiscriminately from all cast cards. We describe an open-source Python implementation of RLAs… ▽ More Risk-limiting audits (RLAs) can use information about which ballot cards contain which contests (card-style data, CSD) to ensure that each contest receives adequate scrutiny, without examining more cards than necessary. RLAs using CSD in this way can be substantially more efficient than RLAs that sample indiscriminately from all cast cards. We describe an open-source Python implementation of RLAs using CSD for the Hart InterCivic Verity voting system and the Dominion Democracy Suite(R) voting system. The software is demonstrated using all 181 contests in the 2020 general election and all 214 contests in the 2022 general election in Orange County, CA, USA, the fifth-largest election jurisdiction in the U.S., with over 1.8 million active voters. (Orange County uses the Hart Verity system.) To audit the 181 contests in 2020 to a risk limit of 5% without using CSD would have required a complete hand tally of all 3,094,308 cast ballot cards. With CSD, the estimated sample size is about 20,100 cards, 0.65% of the cards cast--including one tied contest that required a complete hand count. To audit the 214 contests in 2022 to a risk limit of 5% without using CSD would have required a complete hand tally of all 1,989,416 cast cards. With CSD, the estimated sample size is about 62,250 ballots, 3.1% of cards cast--including three contests with margins below 0.1% and 9 with margins below 0.5%. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2205.01061 [pdf]

Robust inference for matching under rolling enrollment

Authors: Amanda K. Glazer, Samuel D. Pimentel

Abstract: Matching in observational studies faces complications when units enroll in treatment on a rolling basis. While each treated unit has a specific time of entry into the study, control units each have many possible comparison, or "pseudo-treatment," times. The recent GroupMatch framework (Pimentel et al., 2020) solves this problem by searching over all possible pseudo-treatment times for each control… ▽ More Matching in observational studies faces complications when units enroll in treatment on a rolling basis. While each treated unit has a specific time of entry into the study, control units each have many possible comparison, or "pseudo-treatment," times. The recent GroupMatch framework (Pimentel et al., 2020) solves this problem by searching over all possible pseudo-treatment times for each control and selecting those permitting the closest matches based on covariate histories. However, valid methods of inference have been described only for special cases of the general GroupMatch design, and these rely on strong assumptions. We provide three important innovations to address these problems. First, we introduce a new design, GroupMatch with instance replacement, that allows additional flexibility in control selection and proves more amenable to analysis. Second, we propose a block bootstrap approach for inference in GroupMatch with instance replacement and demonstrate that it accounts properly for complex correlations across matched sets. Third, we develop a permutation-based falsification test to detect possible violations of the important timepoint agnosticism assumption underpinning GroupMatch, which requires homogeneity of potential outcome means across time. Via simulation and a case study of the impact of short-term injuries on batting performance in major league baseball, we demonstrate the effectiveness of our methods for data analysis in practice. △ Less

Submitted 13 July, 2024; v1 submitted 2 May, 2022; originally announced May 2022.

Journal ref: Journal of Causal Inference. 11 (2023)

arXiv:2012.03371 [pdf, other]

More style, less work: card-style data decrease risk-limiting audit sample sizes

Authors: Amanda K. Glazer, Jacob V. Spertus, Philip B. Stark

Abstract: U.S. elections rely heavily on computers such as voter registration databases, electronic pollbooks, voting machines, scanners, tabulators, and results reporting websites. These introduce digital threats to election outcomes. Risk-limiting audits (RLAs) mitigate threats to some of these systems by manually inspecting random samples of ballot cards. RLAs have a large chance of correcting wrong outc… ▽ More U.S. elections rely heavily on computers such as voter registration databases, electronic pollbooks, voting machines, scanners, tabulators, and results reporting websites. These introduce digital threats to election outcomes. Risk-limiting audits (RLAs) mitigate threats to some of these systems by manually inspecting random samples of ballot cards. RLAs have a large chance of correcting wrong outcomes (by conducting a full manual tabulation of a trustworthy record of the votes), but can save labor when reported outcomes are correct. This efficiency is eroded when sampling cannot be targeted to ballot cards that contain the contest(s) under audit. If the sample is drawn from all cast cards, RLA sample sizes scale like the reciprocal of the fraction of ballot cards that contain the contest(s) under audit. That fraction shrinks as the number of cards per ballot grows (i.e., when elections contain more contests) and as the fraction of ballots that contain the contest decreases (i.e., when a smaller percentage of voters are eligible to vote in the contest). States that conduct RLAs of contests on multi-card ballots or of small contests can dramatically reduce sample sizes by using information about which ballot cards contain which contests -- by keeping track of card-style data (CSD). For instance, CSD reduces the expected number of draws needed to audit a single countywide contest on a 4-card ballot by 75%. Similarly, CSD reduces the expected number of draws by 95% or more for an audit of two contests with the same margin on a 4-card ballot if one contest is on every ballot and the other is on 10% of ballots. In realistic examples, the savings can be several orders of magnitude. △ Less

Submitted 6 December, 2020; originally announced December 2020.

Comments: 19 pages, 9 figures. In submission at Digital Threats: Research and Practice

Showing 1–4 of 4 results for author: Glazer, A K