Search | arXiv e-print repository

Nonprobability follow-up sample analysis: an application to SARS-CoV-2 infection prevalence estimation

Authors: Yan Li, Laura Yee, Sally Hunsberger, Matthew J. Memoli, Kaitlyn Sadtler, Barry I. Graubard

Abstract: Public health policy makers are faced with making crucial decisions rapidly during infectious disease outbreaks such as that caused by SARS-CoV-2. Ideally, rapidly deployed representative health surveys could provide needed data for such decisions. Under the constraints of a limited timeframe and resources, it may be infeasible to implement random based (probability) sampling that yields a populat… ▽ More Public health policy makers are faced with making crucial decisions rapidly during infectious disease outbreaks such as that caused by SARS-CoV-2. Ideally, rapidly deployed representative health surveys could provide needed data for such decisions. Under the constraints of a limited timeframe and resources, it may be infeasible to implement random based (probability) sampling that yields a population representative survey sample with high response rates. As an alternative, a volunteer (nonprobability) sample is often collected using outreach methods such as social media and web surveys. Compared to a probability sample, a nonprobability sample is subject to selection bias. In addition, when participants are followed longitudinally nonresponse often occurs at later follow up timepoints. As a result, estimates of cross-sectional parameters at later timepoints will be subject to selection bias and nonresponse bias. In this paper, we create kernel-weighted pseudoweights (KW) for the baseline survey participants and construct nonresponse-adjusted kw (kwNR) for respondents at each follow-visit to estimate the population mean at the follow-up visits. We develop Taylor Linearization variance estimation that accounts for variability due to estimating both pseudoweights and the nonresponse adjustments. Simulations are conducted to evaluate the proposed kwNR-weighted estimates. We investigate covariate effects on each of the following: baseline sample participation propensity, follow-up response propensity and the mean of the outcome. We apply the proposed kwNR-weighted methods to the SARS-Cov-2 antibody seropositivity longitudinal study, which begins with a baseline survey early in the pandemic, and collects data at six- and twelve-month post baseline follow-ups. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 12 pages

arXiv:2006.10533 [pdf]

Endpoints for randomized controlled clinical trials for COVID-19 treatments

Authors: Lori E Dodd, Dean Follmann, Jing Wang, Franz Koenig, Lisa L Korn, Christian Schoergenhofer, Michael Proschan, Sally Hunsberger, Tyler Bonnett, Mat Makowski, Drifa Belhadi, Yeming Wang, Bin Cao, France Mentre, Thomas Jaki

Abstract: Introduction: Endpoint choice for randomized controlled trials of treatments for COVID-19 is complex. A new disease brings many uncertainties, but trials must start rapidly. COVID-19 is heterogeneous, ranging from mild disease that improves within days to critical disease that can last weeks and can end in death. While improvement in mortality would provide unquestionable evidence about clinical s… ▽ More Introduction: Endpoint choice for randomized controlled trials of treatments for COVID-19 is complex. A new disease brings many uncertainties, but trials must start rapidly. COVID-19 is heterogeneous, ranging from mild disease that improves within days to critical disease that can last weeks and can end in death. While improvement in mortality would provide unquestionable evidence about clinical significance of a treatment, sample sizes for a study evaluating mortality are large and may be impractical. Furthermore, patient states in between "cure" and "death" represent meaningful distinctions. Clinical severity scores have been proposed as an alternative. However, the appropriate summary measure for severity scores has been the subject of debate, particularly in relating to the uncertainty about the time-course of COVID-19. Outcomes measured at fixed time-points may risk missing the time of clinical benefit. An endpoint such as time-to-improvement (or recovery), avoids the timing problem. However, some have argued that power losses will result from reducing the ordinal scale to a binary state of "recovered" vs "not recovered." Methods: We evaluate statistical power for possible trial endpoints for COVID-19 treatment trials using simulation models and data from two recent COVID-19 treatment trials. Results: Power for fixed-time point methods depends heavily on the time selected for evaluation. Time-to-improvement (or recovery) analyses do not specify a time-point. Time-to-event approaches have reasonable statistical power, even when compared to a fixed time-point method evaluated at the optimal time. Discussion: Time-to-event analyses methods have advantages in the COVID-19 setting, unless the optimal time for evaluating treatment effect is known in advance. Even when the optimal time is known, a time-to-event approach may increase power for interim analyses. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:1904.05416 [pdf, ps, other]

doi 10.1214/21-SS131

Practical Valid Inferences for the Two-Sample Binomial Problem

Authors: Michael P. Fay, Sally A. Hunsberger

Abstract: Our interest is whether two binomial parameters differ, which parameter is larger, and by how much. This apparently simple problem was addressed by Fisher in the 1930's, and has been the subject of many review papers since then. Yet there continues to be new work on this issue and no consensus solution. Previous reviews have focused primarily on testing and the properties of validity and power, or… ▽ More Our interest is whether two binomial parameters differ, which parameter is larger, and by how much. This apparently simple problem was addressed by Fisher in the 1930's, and has been the subject of many review papers since then. Yet there continues to be new work on this issue and no consensus solution. Previous reviews have focused primarily on testing and the properties of validity and power, or primarily on confidence intervals, their coverage, and expected length. Here we evaluate both. For example, we consider whether a p-value and its matching confidence interval are compatible, meaning that the p-value rejects at level $α$ if and only if the $1-α$ confidence interval excludes all null parameter values. For focus, we only examine non-asymptotic inferences, so that most of the p-values and confidence intervals are valid (i.e., exact) by construction. Within this focus, we review different methods emphasizing many of the properties and interpretational aspects we desire from applied frequentist inference: validity, accuracy, good power, equivariance, compatibility, coherence, and parameterization and direction of effect. We show that no one method can meet all the desirable properties and give recommendations based on which properties are given more importance. △ Less

Submitted 24 March, 2021; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: 41 pages, 8 figures. To appear in Statistics Surveys. v2 has changes based on reviewer comments. Main differences are the old v1 Sections 8 (Noninferiority and Equivalence Hypotheses) and 12 (Connection to Causal Inferences) were deleted for length. There was no issue with the correctness of those sections. There are other minor changes and additions in v2, with the main changes in Section 7

MSC Class: 62F03; 62F25

Journal ref: Statistics Surveys (2021) 15: 72-110

Showing 1–3 of 3 results for author: Hunsberger, S