Search | arXiv e-print repository

Exposure measurement error correction in longitudinal studies with discrete outcomes

Authors: Ce Yang, Ning Zhang, Jiaxuan Li, Unnati V. Mehta, Jaime E. Hart, Donna Spiegelman, Molin Wang

Abstract: Environmental epidemiologists are often interested in estimating the effect of time-varying functions of the exposure history on health outcomes. However, the individual exposure measurements that constitute the history upon which an exposure history function is constructed are usually subject to measurement errors. To obtain unbiased estimates of the effects of such mismeasured functions in longi… ▽ More Environmental epidemiologists are often interested in estimating the effect of time-varying functions of the exposure history on health outcomes. However, the individual exposure measurements that constitute the history upon which an exposure history function is constructed are usually subject to measurement errors. To obtain unbiased estimates of the effects of such mismeasured functions in longitudinal studies with discrete outcomes, a method applicable to the main study/validation study design is developed. Various estimation procedures are explored. Simulation studies were conducted to assess its performance compared to standard analysis, and we found that the proposed method had good performance in terms of finite sample bias reduction and nominal coverage probability improvement. As an illustrative example, we applied the new method to a study of long-term exposure to PM2.5, in relation to the occurrence of anxiety disorders in the Nurses Health Study II. Failing to correct the error-prone exposure can lead to an underestimation of the chronic exposure effect of PM2.5. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: 22 pages, has Supplementary

arXiv:2505.05771 [pdf, ps, other]

Statistical methods for cost-effectiveness analysis of left-truncated censored survival data with treatment delays

Authors: Polyna Khudyakov, Li Xu, Ce Yang, Donna Spiegelman, Molin Wang

Abstract: The incremental cost-effectiveness ratio (ICER) and incremental net benefit (INB) are widely used for cost-effectiveness analysis. We develop methods for estimation and inference for the ICER and INB which use the semiparametric stratified Cox proportional hazard model, allowing for adjustment for risk factors. Since in public health settings, patients often begin treatment after they become eligi… ▽ More The incremental cost-effectiveness ratio (ICER) and incremental net benefit (INB) are widely used for cost-effectiveness analysis. We develop methods for estimation and inference for the ICER and INB which use the semiparametric stratified Cox proportional hazard model, allowing for adjustment for risk factors. Since in public health settings, patients often begin treatment after they become eligible, we account for delay times in treatment initiation. Excellent finite sample properties of the proposed estimator are demonstrated in an extensive simulation study under different delay scenarios. We apply the proposed method to evaluate the cost-effectiveness of switching treatments among AIDS patients in Tanzania. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 24 pages, 4 figures, has Supplementary

arXiv:2503.22933 [pdf, other]

Improving Transportability of Regression Calibration Under the Main/External Validation Study Design

Authors: Zexiang Li, Donna Spiegelman, Molin Wang, Zuoheng Wang, Xin Zhou

Abstract: In epidemiology, obtaining accurate individual exposure measurements can be costly and challenging. Thus, these measurements are often subject to error. Regression calibration with a validation study is widely employed as a study design and analysis method to correct for measurement error in the main study due to its broad applicability and simple implementation. However, relying on an external va… ▽ More In epidemiology, obtaining accurate individual exposure measurements can be costly and challenging. Thus, these measurements are often subject to error. Regression calibration with a validation study is widely employed as a study design and analysis method to correct for measurement error in the main study due to its broad applicability and simple implementation. However, relying on an external validation study to assess the measurement error process carries the risk of introducing bias into the analysis. Specifically, if the parameters of regression calibration model estimated from the external validation study are not transportable to the main study, the subsequent estimated parameter describing the exposure-disease association will be biased. In this work, we improve the regression calibration method for linear regression models using an external validation study. Unlike the original approach, our proposed method ensures that the regression calibration model is transportable by estimating the parameters in the measurement error generating process using the external validation study and obtaining the remaining parameter values in the regression calibration model directly from the main study. This guarantees that parameter values in the regression calibration model will be applicable to the main study. We derived the theoretical properties of our proposed method. The simulation results show that the proposed method effectively reduces bias and maintains nominal confidence interval coverage. We applied this method to data from the Health Professionals Follow-Up Study (main study) and the Men's Lifestyle Validation Study (external validation study) to assess the effects of dietary intake on body weight. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2502.10170 [pdf, other]

Identifying Key Influencers using an Egocentric Network-based Randomized Design

Authors: Zhibing He, Junhan Fan, Ashley Buchanan, Donna Spiegelman, Laura Forastiere

Abstract: Behavioral health interventions, such as trainings or incentives, are implemented in settings where individuals are interconnected, and the intervention assigned to some individuals may also affect others within their network. Evaluating such interventions requires assessing both the effect of the intervention on those who receive it and the spillover effect on those connected to the treated indiv… ▽ More Behavioral health interventions, such as trainings or incentives, are implemented in settings where individuals are interconnected, and the intervention assigned to some individuals may also affect others within their network. Evaluating such interventions requires assessing both the effect of the intervention on those who receive it and the spillover effect on those connected to the treated individuals. With behavioral interventions, spillover effects can be heterogeneous in that certain individuals, due to their social connectedness and individual characteristics, are more likely to respond to the intervention and influence their peers' behaviors. Targeting these individuals can enhance the effectiveness of interventions in the population. In this paper, we focus on an Egocentric Network-based Randomized Trial (ENRT) design, wherein a set of index participants is recruited from the population and randomly assigned to the treatment group, while concurrently collecting outcome data on their nominated network members, who remina untreated. In such design, spillover effects on network members may vary depending on the characteristics of the index participant. Here, we develop a testing method, the Multiple Comparison with Best (MCB), to identify subgroups of index participants whose treatment exhibits the largest spillover effect on their network members. Power and sample size calculations are then provided to design ENRTs that can detect key influencers. The proposed methods are demonstrated in a study on network-based peer HIV prevention education program, providing insights into strategies for selecting peer educators in peer education interventions. △ Less

Submitted 14 February, 2025; originally announced February 2025.

arXiv:2411.08929 [pdf]

Power and Sample Size Calculations for Cluster Randomized Hybrid Type 2 Effectiveness-Implementation Studies

Authors: Melody A. Owen, Geoffrey M. Curran, Justin D. Smith, Yacob Tedla, Chao Cheng, Donna Spiegelman

Abstract: Hybrid studies allow investigators to simultaneously study an intervention effectiveness outcome and an implementation research outcome. In particular, type 2 hybrid studies support research that places equal importance on both outcomes rather than focusing on one and secondarily on the other (i.e., type 1 and type 3 studies). Hybrid 2 studies introduce the statistical issue of multiple testing, c… ▽ More Hybrid studies allow investigators to simultaneously study an intervention effectiveness outcome and an implementation research outcome. In particular, type 2 hybrid studies support research that places equal importance on both outcomes rather than focusing on one and secondarily on the other (i.e., type 1 and type 3 studies). Hybrid 2 studies introduce the statistical issue of multiple testing, complicated by the fact that they are typically also cluster randomized trials. Standard statistical methods do not apply in this scenario. Here, we describe the design methodologies available for validly powering hybrid type 2 studies and producing reliable sample size calculations in a cluster-randomized design with a focus on binary outcomes. Through a literature search, 18 publications were identified that included methods relevant to the design of hybrid 2 studies. Five methods were identified, two of which did not account for clustering but are extended in this article to do so, namely the combined outcomes approach and the single 1-degree of freedom combined test. Procedures for powering hybrid 2 studies using these five methods are described and illustrated using input parameters inspired by a study from the Community Intervention to Reduce CardiovascuLar Disease in Chicago (CIRCL-Chicago) Implementation Research Center. In this illustrative example, the intervention effectiveness outcome was controlled blood pressure, and the implementation outcome was reach. The conjunctive test resulted in higher power than the popular p-value adjustment methods, and the newly extended combined outcomes and single 1-DF test were found to be the most powerful among all of the tests. △ Less

Submitted 12 November, 2024; originally announced November 2024.

Comments: 21 pages, 5 tables, 1 figure, 4 appendices

arXiv:2410.09278 [pdf, ps, other]

Measurement Error Correction for Spatially Defined Environmental Exposures in Survival Analysis

Authors: Lin Ge, Ce Yang, David Zucker, Jiaxuan Li, Donna Spiegelman, Molin Wang

Abstract: Environmental exposures are often defined using buffer zones around geocoded home addresses, but these static boundaries can miss dynamic daily activity patterns, leading to biased results. This paper presents a novel measurement error correction method for spatially defined environmental exposures within a survival analysis framework using the Cox proportional hazards model. The method corrects h… ▽ More Environmental exposures are often defined using buffer zones around geocoded home addresses, but these static boundaries can miss dynamic daily activity patterns, leading to biased results. This paper presents a novel measurement error correction method for spatially defined environmental exposures within a survival analysis framework using the Cox proportional hazards model. The method corrects high-dimensional surrogate exposures from geocoded residential data at multiple buffer radii by applying principal component analysis for dimension reduction and leveraging external GPS-tracked validation datasets containing true exposure measurements. It also derives the asymptotic properties and variances of the proposed estimators. Extensive simulations are conducted to evaluate the performance of the proposed estimators, demonstrating its ability to improve accuracy in estimated exposure effects. An illustrative application assesses the impact of greenness exposure on depression incidence in the Nurses' Health Study (NHS). The results demonstrate that correcting for measurement error significantly enhances the accuracy of exposure estimates. This method offers a critical advancement for accurately assessing the health impacts of environmental exposures, outperforming traditional static buffer approaches. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.07135 [pdf]

Causal Inference with Double/Debiased Machine Learning for Evaluating the Health Effects of Multiple Mismeasured Pollutants

Authors: Gang Xu, Xin Zhou, Molin Wang, Boya Zhang, Wenhao Jiang, Francine Laden, Helen H. Suh, Adam A. Szpiro, Donna Spiegelman, Zuoheng Wang

Abstract: One way to quantify exposure to air pollution and its constituents in epidemiologic studies is to use an individual's nearest monitor. This strategy results in potential inaccuracy in the actual personal exposure, introducing bias in estimating the health effects of air pollution and its constituents, especially when evaluating the causal effects of correlated multi-pollutant constituents measured… ▽ More One way to quantify exposure to air pollution and its constituents in epidemiologic studies is to use an individual's nearest monitor. This strategy results in potential inaccuracy in the actual personal exposure, introducing bias in estimating the health effects of air pollution and its constituents, especially when evaluating the causal effects of correlated multi-pollutant constituents measured with correlated error. This paper addresses estimation and inference for the causal effect of one constituent in the presence of other PM2.5 constituents, accounting for measurement error and correlations. We used a linear regression calibration model, fitted with generalized estimating equations in an external validation study, and extended a double/debiased machine learning (DML) approach to correct for measurement error and estimate the effect of interest in the main study. We demonstrated that the DML estimator with regression calibration is consistent and derived its asymptotic variance. Simulations showed that the proposed estimator reduced bias and attained nominal coverage probability across most simulation settings. We applied this method to assess the causal effects of PM2.5 constituents on cognitive function in the Nurses' Health Study and identified two PM2.5 constituents, Br and Mn, that showed a negative causal effect on cognitive function after measurement error correction. △ Less

Submitted 21 September, 2024; originally announced October 2024.

arXiv:2401.00461 [pdf, other]

A Penalized Functional Linear Cox Regression Model for Spatially-defined Environmental Exposure with an Estimated Buffer Distance

Authors: Jooyoung Lee, Zhibing He, Charlotte Roscoe, Peter James, Li Xu, Donna Spiegelman, David Zucker, Molin Wang

Abstract: In environmental health research, it is of interest to understand the effect of the neighborhood environment on health. Researchers have shown a protective association between green space around a person's residential address and depression outcomes. In measuring exposure to green space, distance buffers are often used. However, buffer distances differ across studies. Typically, the buffer distanc… ▽ More In environmental health research, it is of interest to understand the effect of the neighborhood environment on health. Researchers have shown a protective association between green space around a person's residential address and depression outcomes. In measuring exposure to green space, distance buffers are often used. However, buffer distances differ across studies. Typically, the buffer distance is determined by researchers a priori. It is unclear how to identify an appropriate buffer distance for exposure assessment. To address geographic uncertainty problem for exposure assessment, we present a domain selection algorithm based on the penalized functional linear Cox regression model. The theoretical properties of our proposed method are studied and simulation studies are conducted to evaluate finite sample performances of our method. The proposed method is illustrated in a study of associations of green space exposure with depression and/or antidepressant use in the Nurses' Health Study. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Comments: 27 pages, 5 figures

arXiv:2312.07829 [pdf, ps, other]

Causal Covariate Selection for the Imputation-based Regression Calibration Method for Exposure Measurement Error Bias Correction

Authors: Wenze Tang, Donna Spiegelman, Yujie Wu, Molin Wang

Abstract: In this paper, we identify the criteria for the selection of the minimal and most efficient covariate adjustment sets for the regression calibration method developed by Carroll, Rupert and Stefanski (CRS, 1992), used to correct bias due to continuous exposure measurement error. We utilize directed acyclic graphs to illustrate how subject matter knowledge can aid in the selection of such adjustment… ▽ More In this paper, we identify the criteria for the selection of the minimal and most efficient covariate adjustment sets for the regression calibration method developed by Carroll, Rupert and Stefanski (CRS, 1992), used to correct bias due to continuous exposure measurement error. We utilize directed acyclic graphs to illustrate how subject matter knowledge can aid in the selection of such adjustment sets. Valid measurement error correction requires the collection of data on any (1) common causes of true exposure and outcome and (2) common causes of measurement error and outcome, in both the main study and validation study. For the CRS regression calibration method to be valid, researchers need to minimally adjust for covariate set (1) in both the measurement error model (MEM) and the outcome model and adjust for covariate set (2) at least in the MEM. In practice, we recommend including the minimal covariate adjustment set in both the MEM and the outcome model. In contrast with the regression calibration method developed by Rosner, Spiegelman and Willet, it is valid and more efficient to adjust for correlates of the true exposure or of measurement error that are not risk factors in the MEM only under CRS method. We applied the proposed covariate selection approach to the Health Professional Follow-up Study, examining the effect of fiber intake on cardiovascular incidence. In this study, we demonstrated potential issues with a data-driven approach to building the MEM that is agnostic to the structural assumptions. We extend the originally proposed estimators to settings where effect modification by a covariate is allowed. Finally, we caution against the use of the regression calibration method to calibrate the true nutrition intake using biomarkers. △ Less

Submitted 14 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 11 pages, 4 tables, 3 figures. arXiv admin note: text overlap with arXiv:2212.00795

arXiv:2310.02151 [pdf, ps, other]

Estimation and inference for causal spillover effects in egocentric-network randomized trials in the presence of network membership misclassification

Authors: Ariel Chao, Donna Spiegelman, Ashley Buchanan, Laura Forastiere

Abstract: To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), in which those receiving the intervention are encouraged to disseminate information to their peers. The Average Spillover Effect (ASpE) measures the impact of the i… ▽ More To leverage peer influence and increase population behavioral changes, behavioral interventions often rely on peer-based strategies. A common study design that assesses such strategies is the egocentric-network randomized trial (ENRT), in which those receiving the intervention are encouraged to disseminate information to their peers. The Average Spillover Effect (ASpE) measures the impact of the intervention on participants who do not receive it, but whose outcomes may be affected by others who do. The assessment of the ASpE relies on assumptions about, and correct measurement of, interference sets within which individuals may influence one another's outcomes. It can be challenging to properly specify interference sets, such as networks in ENRTs, and when mismeasured, intervention effects estimated by existing methods will be biased. In HIV prevention studies where social networks play an important role in disease transmission, correcting ASpE estimates for bias due to network misclassification is critical for accurately evaluating the full impact of interventions. We combined measurement error and causal inference methods to bias-correct the ASpE estimate for network misclassification in ENRTs, when surrogate networks are recorded in place of true ones, and validation data that relate the misclassified to the true networks are available. We investigated finite sample properties of our methods in an extensive simulation study, and illustrated our methods in the HIV Prevention Trials Network (HPTN) 037 study. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2308.00791 [pdf, other]

Design of egocentric network-based studies to estimate causal effects under interference

Authors: Junhan Fang, Donna Spiegelman, Ashley Buchanan, Laura Forastiere

Abstract: Many public health interventions are conducted in settings where individuals are connected to one another and the intervention assigned to randomly selected individuals may spill over to other individuals they are connected to. In these spillover settings, the effects of such interventions can be quantified in several ways. The average individual effect measures the intervention effect among those… ▽ More Many public health interventions are conducted in settings where individuals are connected to one another and the intervention assigned to randomly selected individuals may spill over to other individuals they are connected to. In these spillover settings, the effects of such interventions can be quantified in several ways. The average individual effect measures the intervention effect among those directly treated, while the spillover effect measures the effect among those connected to those directly treated. In addition, the overall effect measures the average intervention effect across the study population, over those directly treated along with those to whom the intervention spills over but who are not directly treated. Here, we develop methods for study design with the aim of estimating individual, spillover, and overall effects. In particular, we consider an egocentric network-based randomized design in which a set of index participants is recruited from the population and randomly assigned to treatment, while data are also collected from their untreated network members. We use the potential outcomes framework to define two clustered regression modeling approaches and clarify the underlying assumptions required to identify and estimate causal effects. We then develop sample size formulas for detecting individual, spillover, and overall effects. We investigate the roles of the intra-class correlation coefficient and the probability of treatment allocation on the required number of egocentric networks with a fixed number of network members for each egocentric network and vice-versa. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 30 pages for main text including figures and tables, 5 figures and 3 tables

arXiv:2307.06552 [pdf, ps, other]

Learn-As-you-GO (LAGO) Trials: Optimizing Treatments and Preventing Trial Failure Through Ongoing Learning

Authors: Ante Bing, Donna Spiegelman, Daniel Nevo, Judith J. Lok

Abstract: It is well known that changing the intervention package while a trial is ongoing does not lead to valid inference using standard statistical methods. However, it is often necessary to adapt, tailor, or tweak a complex intervention package in public health implementation trials, especially when the intervention package does not have the desired effect. This article presents conditions under which t… ▽ More It is well known that changing the intervention package while a trial is ongoing does not lead to valid inference using standard statistical methods. However, it is often necessary to adapt, tailor, or tweak a complex intervention package in public health implementation trials, especially when the intervention package does not have the desired effect. This article presents conditions under which the resulting analyses remain valid even when the intervention package is adapted while a trial is ongoing. Our results on such Learn-As-you-GO (LAGO) studies extend the theory of LAGO for binary outcomes following a logistic regression model (Nevo, Lok and Spiegelman, 2021) to LAGO for continuous outcomes under flexible conditional mean model. We derive point and interval estimators of the intervention effects and ensure the validity of hypothesis tests for an overall intervention effect. We develop a confidence set for the optimal intervention package, which achieves a pre-specified mean outcome while minimizing cost, and confidence bands for the mean outcome under all intervention package compositions. This work will be useful for the design and analysis of large-scale intervention trials where the intervention package is adapted, tailored, or tweaked while the trial is ongoing. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 65 pages, 15 tables

MSC Class: 62K99; 62L99 (Primary) 62F05; 62F10; 62F12; 62J12 (Secondary)

arXiv:2304.04868 [pdf, other]

Correcting for bias due to mismeasured exposure in mediation analysis with a survival outcome

Authors: Chao Cheng, Donna Spiegelman, Fan Li

Abstract: Mediation analysis is widely used in health science research to evaluate the extent to which an intermediate variable explains an observed exposure-outcome relationship. However, the validity of analysis can be compromised when the exposure is measured with error. Motivated by the Health Professionals Follow-up Study (HPFS), we investigate the impact of exposure measurement error on assessing medi… ▽ More Mediation analysis is widely used in health science research to evaluate the extent to which an intermediate variable explains an observed exposure-outcome relationship. However, the validity of analysis can be compromised when the exposure is measured with error. Motivated by the Health Professionals Follow-up Study (HPFS), we investigate the impact of exposure measurement error on assessing mediation with a survival outcome, based on the Cox proportional hazards outcome model. When the outcome is rare and there is no exposure-mediator interaction, we show that the uncorrected estimators of the natural indirect and direct effects can be biased into either direction, but the uncorrected estimator of the mediation proportion is approximately unbiased as long as the measurement error is not large or the mediator-exposure association is not strong. We develop ordinary regression calibration and risk set regression calibration approaches to correct the exposure measurement error-induced bias when estimating mediation effects and allowing for an exposure-mediator interaction in the Cox outcome model. The proposed approaches require a validation study to characterize the measurement error process. We apply the proposed approaches to the HPFS (1986-2016) to evaluate extent to which reduced body mass index mediates the protective effect of vigorous physical activity on the risk of cardiovascular diseases, and compare the finite-sample properties of the proposed estimators via simulations. △ Less

Submitted 4 March, 2025; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2212.00795 [pdf, ps, other]

Causal Selection of Covariates in Regression Calibration for Mismeasured Continuous Exposure

Authors: Wenze Tang, Donna Spiegelman, Xiaomei Liao, Molin Wang

Abstract: Regression calibration as developed by Rosner, Spiegelman and Willet is used to correct the bias in effect estimates due to measurement error in continuous exposures. The method involves two models: a measurement error model (MEM) relating the mismeasured exposure to the true exposure and an outcome model relating the mismeasured exposure to outcome. However, no comprehensive guidance exists for d… ▽ More Regression calibration as developed by Rosner, Spiegelman and Willet is used to correct the bias in effect estimates due to measurement error in continuous exposures. The method involves two models: a measurement error model (MEM) relating the mismeasured exposure to the true exposure and an outcome model relating the mismeasured exposure to outcome. However, no comprehensive guidance exists for determining which covariates should be included in each model. In this paper, we investigate the selection of the minimal and most efficient covariate adjustment sets under a causal inference framework. We show that in order to correct for the measurement error, researchers must adjust for, in both MEM and outcome model, any common causes (1) of true exposure and the outcome and (2) of measurement error and the outcome. When such variable(s) are only available in the main study, researchers should still adjust for them in the outcome model to reduce bias, provided that these covariates are at most weakly associated with measurement error. We also show that adjusting for so called prognostic variables that are independent of true exposure and measurement error in outcome model, may increase efficiency, while adjusting for any covariates that are associated only with true exposure generally results in efficiency loss in realistic settings. We apply the proposed covariate selection approach to the Health Professional Follow-up Study dataset to study the effect of fiber intake on cardiovascular disease. Finally, we extend the originally proposed estimators to a non-parametric setting where effect modification by covariates is allowed. △ Less

Submitted 18 September, 2023; v1 submitted 1 October, 2022; originally announced December 2022.

Comments: 11 pages, 3 figures

arXiv:2209.12129 [pdf, other]

The Design of Observational Longitudinal Studies

Authors: Xavier Basagana, Donna Spiegelman

Abstract: This paper considers the design of observational longitudinal studies with a continuous response and a binary time-invariant exposure, where, typically, the exposure is unbalanced, the mean response in the two groups differs at baseline and the measurement times might not be the same for all participants. We consider group differences that are constant and those that increase linearly with time. W… ▽ More This paper considers the design of observational longitudinal studies with a continuous response and a binary time-invariant exposure, where, typically, the exposure is unbalanced, the mean response in the two groups differs at baseline and the measurement times might not be the same for all participants. We consider group differences that are constant and those that increase linearly with time. We study power, number of study participants (N) and number of repeated measures (r), and provide formulas for each quantity when the other two are fixed, for compound symmetry, damped exponential and random intercepts and slopes covariances. When both N and r can be chosen by the investigator, we study the optimal combination for maximizing power subject to a cost constraint and minimizing cost for fixed power. Intuitive parameterizations are used for all quantities. All calculations are implemented in freely available software. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2207.03597 [pdf, other]

Nonparametric Estimation of the Potential Impact Fraction and Population Attributable Fraction with Individual-Level and Aggregated Data

Authors: Colleen E. Chan, Rodrigo Zepeda-Tello, Dalia Camacho-García-Formentí, Frederick Cudhea, Rafael Meza, Eliane Rodrigues, Donna Spiegelman, Tonatiuh Barrientos-Gutierrez, Xin Zhou

Abstract: The estimation of the potential impact fraction (including the population attributable fraction) with continuous exposure data frequently relies on strong distributional assumptions. However, these assumptions are often violated if the underlying exposure distribution is unknown or if the same distribution is assumed across time or space. Nonparametric methods to estimate the potential impact frac… ▽ More The estimation of the potential impact fraction (including the population attributable fraction) with continuous exposure data frequently relies on strong distributional assumptions. However, these assumptions are often violated if the underlying exposure distribution is unknown or if the same distribution is assumed across time or space. Nonparametric methods to estimate the potential impact fraction are available for cohort data, but no alternatives exist for cross-sectional data. In this article, we discuss the impact of distributional assumptions in the estimation of the population impact fraction, showing that under an infinite set of possibilities, distributional violations lead to biased estimates. We propose nonparametric methods to estimate the potential impact fraction for aggregated (mean and standard deviation) or individual data (e.g. observations from a cross-sectional population survey), and develop simulation scenarios to compare their performance against standard parametric procedures. We illustrate our methodology on an application of sugar-sweetened beverage consumption on incidence of type 2 diabetes. We also present an R package pifpaf to implement these methods. △ Less

Submitted 24 January, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2111.00138 [pdf]

The Missing Covariate Indicator Method is Nearly Valid Almost Always

Authors: Mingyang Song, Xin Zhou, Mathew Pazaris, Donna Spiegelman

Abstract: Background: Although the missing covariate indicator method (MCIM) has been shown to be biased under extreme conditions, the degree and determinants of bias have not been formally assessed. Methods: We derived the formula for the relative bias in the MCIM and systematically investigated conditions under which bias arises. Results: We found that the extent of bias is independent of both the disease… ▽ More Background: Although the missing covariate indicator method (MCIM) has been shown to be biased under extreme conditions, the degree and determinants of bias have not been formally assessed. Methods: We derived the formula for the relative bias in the MCIM and systematically investigated conditions under which bias arises. Results: We found that the extent of bias is independent of both the disease rate and the exposure-outcome association, but is a function of 5 parameters: exposure and covariate prevalences, covariate missingness proportion, and associations of covariate with exposure and outcome. The MCIM was unbiased when the missing covariate is a risk factor for the outcome but not a confounder. The average median relative bias was zero across each of the parameters over a wide range of values considered. When missingness was no greater than 50%, less than 5% of the scenarios considered had relative bias greater than 10%. In several analyses of the Harvard cohort studies, the MCIM produced materially the same results as the multiple imputation method. Conclusion: The MCIM is nearly valid almost always in settings typically encountered in epidemiology and its continued use is recommended, unless the covariate is missing in an extreme proportion or acts as a strong confounder. △ Less

Submitted 29 October, 2021; originally announced November 2021.

arXiv:2108.08417 [pdf, other]

Estimating the natural indirect effect and the mediation proportion via the product method

Authors: Chao Cheng, Donna Spiegelman, Fan Li

Abstract: The natural indirect effect (NIE) and mediation proportion (MP) are two measures of primary interest in mediation analysis. The standard approach for estimating NIE and MP is through the product method, which involves a model for the outcome conditional on the mediator and exposure and another model describing the exposure-mediator relationship. The purpose of this article is to comprehensively de… ▽ More The natural indirect effect (NIE) and mediation proportion (MP) are two measures of primary interest in mediation analysis. The standard approach for estimating NIE and MP is through the product method, which involves a model for the outcome conditional on the mediator and exposure and another model describing the exposure-mediator relationship. The purpose of this article is to comprehensively develop and investigate the finite-sample performance of NIE and MP estimators via the product method. With four common data types, we propose closed-form interval estimators via the theory of estimating equations and multivariate delta method, and evaluate its empirical performance relative to the bootstrap approach. In addition, we have observed that the rare outcome assumption is frequently invoked to approximate the NIE and MP with a binary outcome, although this approximation may lead to non-negligible bias when the outcome is common. We therefore introduce the exact expressions for NIE and MP with a binary outcome without the rare outcome assumption and compare its performance with the approximate estimators. Based upon these theoretical developments and empirical studies, we offer several practical recommendations to inform practice. An R package mediateP is developed to implement the methods for point and variance estimation discussed in this paper. △ Less

Submitted 18 August, 2021; originally announced August 2021.

arXiv:2011.06031 [pdf, other]

swdpwr: A SAS Macro and An R Package for Power Calculation in Stepped Wedge Cluster Randomized Trials

Authors: Jiachen Chen, Xin Zhou, Fan Li, Donna Spiegelman

Abstract: Background and objective: The stepped wedge cluster randomized trial is a study design increasingly used for public health intervention evaluations. Most previous literature focuses on power calculations for this particular type of cluster randomized trials for continuous outcomes, along with an approximation to this approach for binary outcomes. Although not accurate for binary outcomes, it has b… ▽ More Background and objective: The stepped wedge cluster randomized trial is a study design increasingly used for public health intervention evaluations. Most previous literature focuses on power calculations for this particular type of cluster randomized trials for continuous outcomes, along with an approximation to this approach for binary outcomes. Although not accurate for binary outcomes, it has been widely used. To improve the approximation for binary outcomes, two new methods for stepped wedge designs (SWDs) of binary outcomes have recently been published. However, these new methods have not been implemented in publicly available software. The objective of this paper is to present power calculation software for SWDs in various settings for both continuous and binary outcomes. Methods: We have developed a SAS macro %swdpwr and an R package swdpwr for power calculation in SWDs. Different scenarios including cross-sectional and cohort designs, binary and continuous outcomes, marginal and conditional models, three link functions, with and without time effects are accommodated in this software. Results: swdpwr provides an efficient tool to support investigators in the design and analysis of stepped wedge cluster randomized trails. swdpwr addresses the implementation gap between newly proposed methodology and their application to obtain more accurate power calculations in SWDs. Conclusions: This user-friendly software makes the new methods more accessible and incorporates as many variations as currently available, which were not supported in other related packages. swdpwr is implemented under two platforms: SAS and R, satisfying the needs of investigators from various backgrounds. △ Less

Submitted 6 July, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

Comments: 29 pages, 4 figures

arXiv:1902.01377 [pdf, other]

Interpretation of the individual effect under treatment spillover

Authors: Forrest W. Crawford, Olga Morozova, Ashley L. Buchanan, Donna Spiegelman

Abstract: Some interventions may include important spillover or dissemination effects between study participants. For example, vaccines, cash transfers, and education programs may exert a causal effect on participants beyond those to whom individual treatment is assigned. In a recent paper, Buchanan et al. provide a causal definition of the "individual effect" of an intervention in networks of people who in… ▽ More Some interventions may include important spillover or dissemination effects between study participants. For example, vaccines, cash transfers, and education programs may exert a causal effect on participants beyond those to whom individual treatment is assigned. In a recent paper, Buchanan et al. provide a causal definition of the "individual effect" of an intervention in networks of people who inject drugs. In this short note, we discuss the interpretation of the individual effect when a spillover or dissemination effect exists. △ Less

Submitted 4 February, 2019; originally announced February 2019.

arXiv:1808.08229 [pdf, ps, other]

Cox Model with Covariate Measurement Error and Unknown Changepoint

Authors: Sarit Agami, David M. Zucker, Donna Spiegelman

Abstract: The standard Cox model in survival analysis assumes that the covariate effect is constant across the entire covariate domain. However, in many applications, there is interest in considering the possibility that the covariate of main interest is subject to a threshold effect: a change in the slope at a certain point within the covariate domain. Often, the value of this threshold is unknown and need… ▽ More The standard Cox model in survival analysis assumes that the covariate effect is constant across the entire covariate domain. However, in many applications, there is interest in considering the possibility that the covariate of main interest is subject to a threshold effect: a change in the slope at a certain point within the covariate domain. Often, the value of this threshold is unknown and need to be estimated. In addition, often, the covariate of interest is not measured exactly, but rather is subject to some degree of measurement error. In this paper, we discuss estimation of the model parameters under an independent additive error model where the covariate of interesting is measured with error and the potential threshold value in this covariate is unknown. As in earlier work which discussed the case of konwn threshold, we study the performance of several bias correction methods: two versions of regression calibration (RC1 and RC2), two versions of the fitting a model for the induced relative risk (RR1 and RR2), maximum pseudo-partial likelihood estimator (MPPLE) and simulation-extrapolation (SIMEX). These correction methods are compared with the naive estimator. We develop the relevant theory, present a simulation study comparing the several correction methods, and illustrate the use of the bias correction methods in data from the Nurses Health Study (NHS) concerning the relationship between chronic air pollution exposure to particulate matter of diameter 10 $μ$m or less (PM$_{10}$). The simulation results suggest that the best overall choice of bias correction method is either the RR2 method or the MPPLE method. △ Less

Submitted 24 August, 2018; originally announced August 2018.

Comments: 26 pages. arXiv admin note: text overlap with arXiv:1808.07662

arXiv:1808.07662 [pdf, other]

Estimation in the Cox Survival Regression Model with Covariate Measurement Error and a Changepoint

Authors: Sarit Agami, David M. Zucker, Donna Spiegelman

Abstract: The Cox regression model is a popular model for analyzing the relationship between a covariate and a survival endpoint. The standard Cox model assumes a constant covariate effect across the entire covariate domain. However, in many epidemiological and other applications, the covariate of main interest is subject to a threshold effect: a change in the slope at a certain point within the covariate d… ▽ More The Cox regression model is a popular model for analyzing the relationship between a covariate and a survival endpoint. The standard Cox model assumes a constant covariate effect across the entire covariate domain. However, in many epidemiological and other applications, the covariate of main interest is subject to a threshold effect: a change in the slope at a certain point within the covariate domain. Often, the covariate of interest is subject to some degree of measurement error. In this paper, we study measurement error correction in the case where the threshold is known. Several bias correction methods are examined: two versions of regression calibration (RC1 and RC2, the latter of which is new), two methods based on the induced relative risk under a rare event assumption (RR1 and RR2, the latter of which is new), a maximum pseudo-partial likelihood estimator (MPPLE), and simulation-extrapolation (SIMEX). We develop the theory, present simulations comparing the methods, and illustrate their use on data concerning the relationship between chronic air pollution exposure to particulate matter PM10 and fatal myocardial infarction (Nurses Health Study (NHS)), and on data concerning the effect of a subject's long-term underlying systolic blood pressure level on the risk of cardiovascular disease death (Framingham Heart Study (FHS)). The simulations indicate that the best methods are RR2 and MPPLE. △ Less

Submitted 30 August, 2019; v1 submitted 23 August, 2018; originally announced August 2018.

arXiv:1808.06310 [pdf, ps, other]

Analysis of "Learn-As-You-Go" (LAGO) Studies

Authors: Daniel Nevo, Judith J. Lok, Donna Spiegelman

Abstract: In learn-as-you-go (LAGO) adaptive studies, the intervention is a complex package consisting of multiple components, and is adapted in stages during the study based on past outcome data. This design formalizes standard practice, and desires for practice, in public health intervention studies. An effective intervention package is sought, while minimizing intervention package cost. When analyzing da… ▽ More In learn-as-you-go (LAGO) adaptive studies, the intervention is a complex package consisting of multiple components, and is adapted in stages during the study based on past outcome data. This design formalizes standard practice, and desires for practice, in public health intervention studies. An effective intervention package is sought, while minimizing intervention package cost. When analyzing data from a learn-as-you-go study, the interventions in later stages depend upon the outcomes in the previous stages, violating standard statistical theory. We develop methods for estimating the intervention effects in a LAGO study. We prove consistency and asymptotic normality using a novel coupling argument, ensuring the validity of the test for the hypothesis of no overall intervention effect. We develop a confidence set for the optimal intervention package and confidence bands for the success probabilities under alternative package compositions. We illustrate our methods in the BetterBirth Study, which aimed to improve maternal and neonatal outcomes among 157,689 births in Uttar Pradesh, India through a complex, multi-component intervention package. △ Less

Submitted 23 January, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

Showing 1–23 of 23 results for author: Spiegelman, D