-
Relationship between Collider Bias and Interactions on the Log-Additive Scale
Authors:
Apostolos Gkatzionis,
Shaun R. Seaman,
Rachael A. Hughes,
Kate Tilling
Abstract:
Collider bias occurs when conditioning on a common effect (collider) of two variables $X, Y$. In this manuscript, we quantify the collider bias in the estimated association between exposure $X$ and outcome $Y$ induced by selecting on one value of a binary collider $S$ of the exposure and the outcome. In the case of logistic regression, it is known that the magnitude of the collider bias in the exp…
▽ More
Collider bias occurs when conditioning on a common effect (collider) of two variables $X, Y$. In this manuscript, we quantify the collider bias in the estimated association between exposure $X$ and outcome $Y$ induced by selecting on one value of a binary collider $S$ of the exposure and the outcome. In the case of logistic regression, it is known that the magnitude of the collider bias in the exposure-outcome regression coefficient is proportional to the strength of interaction $δ_3$ between $X$ and $Y$ in a log-additive model for the collider: $\mathbb{P} (S = 1 | X, Y) = \exp \left\{ δ_0 + δ_1 X + δ_2 Y + δ_3 X Y \right\}$. We show that this result also holds under a linear or Poisson regression model for the exposure-outcome association. We then illustrate by simulation that even if a log-additive model with interactions is not the true model for the collider, the interaction term in such a model is still informative about the magnitude of collider bias. Finally, we discuss the implications of these findings for methods that attempt to adjust for collider bias, such as inverse probability weighting which is often implemented without including interactions between variables in the weighting model.
△ Less
Submitted 7 August, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
Using Instruments for Selection to Adjust for Selection Bias in Mendelian Randomization
Authors:
Apostolos Gkatzionis,
Eric J. Tchetgen Tchetgen,
Jon Heron,
Kate Northstone,
Kate Tilling
Abstract:
Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missi…
▽ More
Selection bias is a common concern in epidemiologic studies. In the literature, selection bias is often viewed as a missing data problem. Popular approaches to adjust for bias due to missing data, such as inverse probability weighting, rely on the assumption that data are missing at random and can yield biased results if this assumption is violated. In observational studies with outcome data missing not at random, Heckman's sample selection model can be used to adjust for bias due to missing data. In this paper, we review Heckman's method and a similar approach proposed by Tchetgen Tchetgen and Wirth (2017). We then discuss how to apply these methods to Mendelian randomization analyses using individual-level data, with missing data for either the exposure or outcome or both. We explore whether genetic variants associated with participation can be used as instruments for selection. We then describe how to obtain missingness-adjusted Wald ratio, two-stage least squares and inverse variance weighted estimates. The two methods are evaluated and compared in simulations, with results suggesting that they can both mitigate selection bias but may yield parameter estimates with large standard errors in some settings. In an illustrative real-data application, we investigate the effects of body mass index on smoking using data from the Avon Longitudinal Study of Parents and Children.
△ Less
Submitted 13 April, 2024; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Statistical Methods for cis-Mendelian Randomization with Two-sample Summary-level Data
Authors:
Apostolos Gkatzionis,
Stephen Burgess,
Paul J. Newcombe
Abstract:
Mendelian randomization is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data Mendelian randomization analyses with many correlated variants from a single gene region, and particularly on cis-Mendelian randomization studies which use protein expression as a risk factor. Such studie…
▽ More
Mendelian randomization is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data Mendelian randomization analyses with many correlated variants from a single gene region, and particularly on cis-Mendelian randomization studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-Mendelian randomization with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis and Bayesian variable selection. In a simulation study, we show that the various methods have a comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of LDL-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions respectively.
△ Less
Submitted 15 September, 2022; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Contextualizing selection bias in Mendelian randomization: how bad is it likely to be?
Authors:
Apostolos Gkatzionis,
Stephen Burgess
Abstract:
Selection bias affects Mendelian randomization investigations when selection into the study sample depends on a collider between the genetic variant and confounders of the risk factor-outcome association. However, the relative importance of selection bias for Mendelian randomization compared to other potential biases is unclear. We performed an extensive simulation study to assess the impact of se…
▽ More
Selection bias affects Mendelian randomization investigations when selection into the study sample depends on a collider between the genetic variant and confounders of the risk factor-outcome association. However, the relative importance of selection bias for Mendelian randomization compared to other potential biases is unclear. We performed an extensive simulation study to assess the impact of selection bias on a typical Mendelian randomization investigation. Selection bias had a severe impact on bias and Type 1 error rates in our simulation study, but only when selection effects were large. For moderate effects of the risk factor on selection, bias was generally small and Type 1 error rate inflation was not considerable. The magnitude of bias was also affected by the strength of confounder-risk factor and confounder-outcome associations, the structure of the causal diagram and selection frequency. The use of inverse probability weighting ameliorated bias when the selection model was correctly specified, but increased bias when selection bias was moderate and the model was misspecified. Finally, we investigated whether selection bias may explain a recently reported finding that lipoprotein(a) is not a causal risk factor for cardiovascular mortality in individuals with previous coronary heart disease.
△ Less
Submitted 11 March, 2018;
originally announced March 2018.