-
Causal Graph Aided Causal Discovery in an Observational Aneurysmal Subarachnoid Hemorrhage Study
Authors:
Carlo Berzuini,
Davide Luciani,
Hiren C. Patel
Abstract:
Causal inference methods for observational data are increasingly recognized as a valuable complement to randomized clinical trials (RCTs). They can, under strong assumptions, emulate RCTs or help refine their focus. Our approach to causal inference uses causal directed acyclic graphs (DAGs). We are motivated by a concern that many observational studies in medicine begin without a clear definition…
▽ More
Causal inference methods for observational data are increasingly recognized as a valuable complement to randomized clinical trials (RCTs). They can, under strong assumptions, emulate RCTs or help refine their focus. Our approach to causal inference uses causal directed acyclic graphs (DAGs). We are motivated by a concern that many observational studies in medicine begin without a clear definition of their objectives, without awareness of the scientific potential, and without tools to identify the necessary in itinere adjustments. We present and illustrate methods that provide "midway insights" during study's course, identify meaningful causal questions within the study's reach and point to the necessary data base enhancements for these questions to be meaningfully tackled. The method hinges on concepts of identification and positivity. Concepts are illustrated through an analysis of data generated by patients with aneurysmal Subarachnoid Hemorrhage (aSAH) halfway through a study, focusing in particular on the consequences of external ventricular drain (EVD) in strata of the aSAH population. In addition, we propose a method for multicenter studies, to monitor the impact of changes in practice at an individual center's level, by leveraging principles of instrumental variable (IV) inference.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Bayesian Mendelian randomization testing of interval causal null hypotheses: ternary decision rules and loss function calibration
Authors:
Linyi Zou,
Teresa Fazia,
Hui Guo,
Carlo Berzuini
Abstract:
Our approach to Mendelian Randomization (MR) analysis is designed to increase reproducibility of causal effect "discoveries" by: (i) using a Bayesian approach to inference; (ii) replacing the point null hypothesis with a region of practical equivalence consisting of values of negligible magnitude for the effect of interest, while exploiting the ability of Bayesian analysis to quantify the evidence…
▽ More
Our approach to Mendelian Randomization (MR) analysis is designed to increase reproducibility of causal effect "discoveries" by: (i) using a Bayesian approach to inference; (ii) replacing the point null hypothesis with a region of practical equivalence consisting of values of negligible magnitude for the effect of interest, while exploiting the ability of Bayesian analysis to quantify the evidence of the effect falling inside/outside the region; (iii) rejecting the usual binary decision logic in favour of a ternary logic where the hypothesis test may result in either an acceptance or a rejection of the null, while also accommodating an "uncertain" outcome. We present an approach to calibration of the proposed method via loss function, which we use to compare our approach with a frequentist one. We illustrate the method with the aid of a study of the causal effect of obesity on risk of juvenile myocardial infarction.
△ Less
Submitted 10 August, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Discovery methods for systematic analysis of causal molecular networks in modern omics datasets
Authors:
Jack Kelly,
Carlo Berzuini,
Bernard Keavney,
Maciej Tomaszewski,
Hui Guo
Abstract:
With the increasing availability and size of multi-omics datasets, investigating the casual relationships between molecular phenotypes has become an important aspect of exploring underlying biology and genetics. This paper aims to introduce and review the available methods for building large-scale causal molecular networks that have been developed in the past decade. Existing methods have their ow…
▽ More
With the increasing availability and size of multi-omics datasets, investigating the casual relationships between molecular phenotypes has become an important aspect of exploring underlying biology and genetics. This paper aims to introduce and review the available methods for building large-scale causal molecular networks that have been developed in the past decade. Existing methods have their own strengths and limitations so there is no one best approach, and it is instead down to the discretion of the researcher. This review also aims to discuss some of the current limitations to biological interpretation of these networks, and important factors to consider for future studies on molecular networks.
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
Bayesian Mendelian randomization with study heterogeneity and data partitioning for large studies
Authors:
Linyi Zou,
Hui Guo,
Carlo Berzuini
Abstract:
Background: Mendelian randomization (MR) is a useful approach to causal inference from observational studies when randomised controlled trials are not feasible. However, study heterogeneity of two association studies required in MR is often overlooked. When dealing with large studies, recently developed Bayesian MR is limited by its computational expensiveness. Methods: We addressed study heteroge…
▽ More
Background: Mendelian randomization (MR) is a useful approach to causal inference from observational studies when randomised controlled trials are not feasible. However, study heterogeneity of two association studies required in MR is often overlooked. When dealing with large studies, recently developed Bayesian MR is limited by its computational expensiveness. Methods: We addressed study heterogeneity by proposing a random effect Bayesian MR model with multiple exposures and outcomes. For large studies, we adopted a subset posterior aggregation method to tackle the problem of computation. In particular, we divided data into subsets and combine estimated subset causal effects obtained from the subsets". The performance of our method was evaluated by a number of simulations, in which part of exposure data was missing. Results: Random effect Bayesian MR outperformed conventional inverse-variance weighted estimation, whether the true causal effects are zero or non-zero. Data partitioning of large studies had little impact on variations of the estimated causal effects, whereas it notably affected unbiasedness of the estimates with weak instruments and high missing rate of data. Our simulation results indicate that data partitioning is a good way of improving computational efficiency, for little cost of decrease in unbiasedness of the estimates, as long as the sample size of subsets is reasonably large. Conclusions: We have further advanced Bayesian MR by including random effects to explicitly account for study heterogeneity. We also adopted a subset posterior aggregation method to address the issue of computational expensiveness of MCMC, which is important especially when dealing with large studies. Our proposed work is likely to pave the way for more general model settings, as Bayesian approach itself renders great flexibility in model constructions.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Overlapping-sample Mendelian randomisation with multiple exposures: A Bayesian approach
Authors:
Linyi Zou,
Hui Guo,
Carlo Berzuini
Abstract:
Background: Mendelian randomization (MR) has been widely applied to causal inference in medical research. It uses genetic variants as instrumental variables (IVs) to investigate putative causal relationship between an exposure and an outcome. Traditional MR methods have dominantly focussed on a two-sample setting in which IV-exposure association study and IV-outcome association study are independe…
▽ More
Background: Mendelian randomization (MR) has been widely applied to causal inference in medical research. It uses genetic variants as instrumental variables (IVs) to investigate putative causal relationship between an exposure and an outcome. Traditional MR methods have dominantly focussed on a two-sample setting in which IV-exposure association study and IV-outcome association study are independent. However, it is not uncommon that participants from the two studies fully overlap (one-sample) or partly overlap (overlapping-sample). Methods: We proposed a method that is applicable to all the three sample settings. In essence, we converted a two- or overlapping- sample problem to a one-sample problem where data of some or all of the individuals were incomplete. Assume that all individuals were drawn from the same population and unmeasured data were missing at random. Then the unobserved data were treated au pair with the model parameters as unknown quantities, and thus, could be imputed iteratively conditioning on the observed data and estimated parameters using Markov chain Monte Carlo. We generalised our model to allow for pleiotropy and multiple exposures and assessed its performance by a number of simulations using four metrics: mean, standard deviation, coverage and power. Results: Higher sample overlapping rate and stronger instruments led to estimates with higher precision and power. Pleiotropy had a notably negative impact on the estimates. Nevertheless, overall the coverages were high and our model performed well in all the sample settings. Conclusions: Our model offers the flexibility of being applicable to any of the sample settings, which is an important addition to the MR literature which has restricted to one- or two- sample scenarios. Given the nature of Bayesian inference, it can be easily extended to more complex MR analysis in medical research.
△ Less
Submitted 3 November, 2020; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Mendelian Randomization with Incomplete Exposure Data: a Bayesian Approach
Authors:
Teresa Fazia,
Leonardo Egidi,
Burcu Ayoglu,
Ashley Beecham,
Pier Paolo Bitti,
Anna Ticca,
Hui Guo,
Jacob L. McCauley,
Peter Nilsson,
Rosanna Asselta,
Carlo Berzuini,
Luisa Bernardinelli
Abstract:
We expand Mendelian Randomization (MR) methodology to deal with randomly missing data on either the exposure or the outcome variable, and furthermore with data from nonindependent individuals (eg components of a family). Our method rests on the Bayesian MR framework proposed by Berzuini et al (2018), which we apply in a study of multiplex Multiple Sclerosis (MS) Sardinian families to characterise…
▽ More
We expand Mendelian Randomization (MR) methodology to deal with randomly missing data on either the exposure or the outcome variable, and furthermore with data from nonindependent individuals (eg components of a family). Our method rests on the Bayesian MR framework proposed by Berzuini et al (2018), which we apply in a study of multiplex Multiple Sclerosis (MS) Sardinian families to characterise the role of certain plasma proteins in MS causation. The method is robust to presence of pleiotropic effects in an unknown number of instruments, and is able to incorporate inter-individual kinship information. Introduction of missing data allows us to overcome the bias introduced by the (reverse) effect of treatment (in MS cases) on level of protein. From a substantive point of view, our study results confirm recent suspicion that an increase in circulating IL12A and STAT4 protein levels does not cause an increase in MS risk, as originally believed, suggesting that these two proteins may not be suitable drug targets for MS.
△ Less
Submitted 14 February, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Bayesian Mendelian Randomization identifies disease causing proteins via pedigree data, partially observed exposures and correlated instruments
Authors:
Teresa Fazia,
Leonardo Egidi,
Burcu Ayoglu,
Ashley Beecham,
Pier Paolo Bitti,
Anna Ticca,
Jacob L. McCauley,
Peter Nilsson,
Carlo Berzuini,
Luisa Bernardinelli
Abstract:
Background
In a study performed on multiplex Multiple Sclerosis (MS) Sardinian families to identify disease causing plasma proteins, application of Mendelian Randomization (MR) methods encounters difficulties due to relatedness of individuals, correlation between finely mapped genotype instrumental variables (IVs) and presence of missing exposures.
Method
We specialize the method of Berzuini…
▽ More
Background
In a study performed on multiplex Multiple Sclerosis (MS) Sardinian families to identify disease causing plasma proteins, application of Mendelian Randomization (MR) methods encounters difficulties due to relatedness of individuals, correlation between finely mapped genotype instrumental variables (IVs) and presence of missing exposures.
Method
We specialize the method of Berzuini et al (2018) to deal with these difficulties. The proposed method allows pedigree structure to enter the specification of the outcome distribution via kinship matrix, and treating missing exposures as additional parameters to be estimated from the data. It also acknowledges possible correlation between instruments by replacing the originally proposed independence prior for IV-specific pleiotropic effect with a g-prior. Based on correlated (r2< 0.2) IVs, we analysed the data of four candidate MS-causing proteins by using both the independence and the g-prior.
Results
95% credible intervals for causal effect for proteins IL12A and STAT4 lay within the strictly negative real semiaxis, in both analyses, suggesting potential causality. Those instruments whose estimated pleiotropic effect exceeded 85% of total effect on outcome were found to act in trans. Analysis via frequentist MR gave inconsistent results. Replacing the independence with a g-prior led to smaller credible intervals for causal effect.
Conclusions
Bayesian MR may be a good way to study disease causation at a protein level based on family data and moderately correlated instruments.
△ Less
Submitted 23 September, 2019; v1 submitted 2 March, 2019;
originally announced March 2019.
-
Bayesian Mendelian Randomization
Authors:
Carlo Berzuini,
Hui Guo,
Stephen Burgess,
Luisa Bernardinelli
Abstract:
Our Bayesian approach to Mendelian Randomisation uses multiple instruments to assess the putative causal effect of an exposure on an outcome. The approach is robust to violations of the (untestable) Exclusion Restriction condition, and hence it does not require instruments to be independent of the outcome conditional on the exposure and on the confounders of the exposure-outcome relationship. The…
▽ More
Our Bayesian approach to Mendelian Randomisation uses multiple instruments to assess the putative causal effect of an exposure on an outcome. The approach is robust to violations of the (untestable) Exclusion Restriction condition, and hence it does not require instruments to be independent of the outcome conditional on the exposure and on the confounders of the exposure-outcome relationship. The Bayesian approach offers a rigorous handling of the uncertainty (e.g. about the estimated instrument-exposure associations), freedom from asymptotic approximations of the null distribution and the possibility to elaborate the model in any direction of scientific relevance. We illustrate the last feature with the aid of a study of the metabolic mediators of the disease-inducing effects of obesity, where we elaborate the model to investigate whether the causal effect of interest interacts with a covariate. The proposed model contains a vector of unidentifiable parameters, $β$, whose $j$th element represents the pleiotropic (i.e., not mediated by the exposure) component of the association of instrument $j$ with the outcome. We deal with the incomplete identifiability by assuming that the pleiotropic effect of some instruments is null, or nearly so, formally by imposing on $β$ Carvalho's horseshoe shrinkage prior, in such a way that different components of $β$ are subjected to different degrees of shrinking, adaptively and in accord with the compatibility of each individual instrument with the hypothesis of no pleiotropy. This prior requires a minimal input from the user. We present the results of a simulation study into the performance of the proposed method under different types of pleiotropy and sample sizes. Comparisons with the performance of the weighted median estimator are made. Choice of the prior and inference via Markov chain Monte Carlo are discussed.
△ Less
Submitted 31 January, 2017; v1 submitted 9 August, 2016;
originally announced August 2016.
-
Stochastic Mechanistic Interaction
Authors:
Carlo Berzuini,
A. Philip Dawid
Abstract:
We propose a fully probabilistic formulation of the notion of mechanistic interaction (interaction in some fundamental mechanistic sense) between the effects of putative (possibly continuous) causal factors A and B on a binary outcome variable Y indicating 'survival' vs 'failure'. We define mechanistic interaction in terms of departure from a generalized 'noisy OR' model, under which the multiplic…
▽ More
We propose a fully probabilistic formulation of the notion of mechanistic interaction (interaction in some fundamental mechanistic sense) between the effects of putative (possibly continuous) causal factors A and B on a binary outcome variable Y indicating 'survival' vs 'failure'. We define mechanistic interaction in terms of departure from a generalized 'noisy OR' model, under which the multiplicative causal effect of A (resp., B) on the probability of failure cannot be enhanced by manipulating B (resp., A). We present conditions under which mechanistic interaction in the above sense can be assessed via simple tests on excess risk or superadditivity, in a possibly retrospective regime of observation. These conditions are defined in terms of generalized conditional independence relationships (generalised because they may involve non-stochastic 'regime indicators') that can often be checked on a graphical representation of the problem. Inference about mechanistic interaction between direct, or path-specific, causal effects can be accommodated in the proposed framework. The method is illustrated with the aid of a study in experimental psychology.
△ Less
Submitted 19 November, 2014; v1 submitted 26 November, 2013;
originally announced November 2013.
-
Temporal Reasoning with Probabilities
Authors:
Carlo Berzuini,
Riccardo Bellazzi,
Silvana Quaglini
Abstract:
In this paper we explore representations of temporal knowledge based upon the formalism of Causal Probabilistic Networks (CPNs). Two different ?continuous-time? representations are proposed. In the first, the CPN includes variables representing ?event-occurrence times?, possibly on different time scales, and variables representing the ?state? of the system at these times. In the second, the CPN…
▽ More
In this paper we explore representations of temporal knowledge based upon the formalism of Causal Probabilistic Networks (CPNs). Two different ?continuous-time? representations are proposed. In the first, the CPN includes variables representing ?event-occurrence times?, possibly on different time scales, and variables representing the ?state? of the system at these times. In the second, the CPN describes the influences between random variables with values in () representing dates, i.e. time-points associated with the occurrence of relevant events. However, structuring a system of inter-related dates as a network where all links commit to a single specific notion of cause and effect is in general far from trivial and leads to severe difficulties. We claim that we should recognize explicitly different kinds of relation between dates, such as ?cause?, ?inhibition?, ?competition?, etc., and propose a method whereby these relations are coherently embedded in a CPN using additional auxiliary nodes corresponding to "instrumental" variables. Also discussed, though not covered in detail, is the topic concerning how the quantitative specifications to be inserted in a temporal CPN can be learned from specific data.
△ Less
Submitted 27 March, 2013;
originally announced April 2013.
-
Bayesian Networks Aplied to Therapy Monitoring
Authors:
Carlo Berzuini,
David J. Spiegelhalter,
Riccardo Bellazzi
Abstract:
We propose a general Bayesian network model for application in a wide class of problems of therapy monitoring. We discuss the use of stochastic simulation as a computational approach to inference on the proposed class of models. As an illustration we present an application to the monitoring of cytotoxic chemotherapy in breast cancer.
We propose a general Bayesian network model for application in a wide class of problems of therapy monitoring. We discuss the use of stochastic simulation as a computational approach to inference on the proposed class of models. As an illustration we present an application to the monitoring of cytotoxic chemotherapy in breast cancer.
△ Less
Submitted 20 March, 2013;
originally announced March 2013.
-
Direct genetic effects and their estimation from matched case-control data
Authors:
Carlo Berzuini,
Stijn Vansteelandt,
Luisa Foco,
Roberta Pastorino,
Luisa Bernardinelli
Abstract:
In genetic association studies, a single marker is often associated with multiple, correlated phenotypes (e.g., obesity and cardiovascular disease, or nicotine dependence and lung cancer). A pervasive question is then whether that marker has independent effects on all phenotypes. In this article, we address this question by assessing whether there is a direct genetic effect on one phenotype that i…
▽ More
In genetic association studies, a single marker is often associated with multiple, correlated phenotypes (e.g., obesity and cardiovascular disease, or nicotine dependence and lung cancer). A pervasive question is then whether that marker has independent effects on all phenotypes. In this article, we address this question by assessing whether there is a direct genetic effect on one phenotype that is not mediated through the other phenotypes. In particular, we investigate how to identify and estimate such direct genetic effects on the basis of (matched) case-control data. We discuss conditions under which such effects are identifiable from the available (matched) case-control data. We find that direct genetic effects are sometimes estimable via standard regression methods, and sometimes via a more general G-estimation method, which has previously been proposed for random samples and unmatched case-control studies (Vansteelandt, 2009) and is here extended to matched case-control studies. The results are used to assess whether the FTO gene is associated with myocardial infarction other than via an effect on obesity.
△ Less
Submitted 21 December, 2011;
originally announced December 2011.
-
Deep determinism and the assessment of mechanistic interaction between categorical and continuous variables
Authors:
Carlo Berzuini,
A. Philip Dawid
Abstract:
Our aim is to detect mechanistic interaction between the effects of two causal factors on a binary response, as an aid to identifying situations where the effects are mediated by a common mechanism. We propose a formalization of mechanistic interaction which acknowledges asymmetries of the kind "factor A interferes with factor B, but not viceversa". A class of tests for mechanistic interaction is…
▽ More
Our aim is to detect mechanistic interaction between the effects of two causal factors on a binary response, as an aid to identifying situations where the effects are mediated by a common mechanism. We propose a formalization of mechanistic interaction which acknowledges asymmetries of the kind "factor A interferes with factor B, but not viceversa". A class of tests for mechanistic interaction is proposed, which works on discrete or continuous causal variables, in any combination. Conditions under which these tests can be applied under a generic regime of data collection, be it interventional or observational, are discussed in terms of conditional independence assumptions within the framework of Augmented Directed Graphs. The scientific relevance of the method and the practicality of the graphical framework are illustrated with the aid of two studies in coronary artery disease. Our analysis relies on the "deep determinism" assumption that there exists some relevant set V - possibly unobserved - of "context variables", such that the response Y is a deterministic function of the values of V and of the causal factors of interest. Caveats regarding this assumption in real studies are discussed.
△ Less
Submitted 10 December, 2010;
originally announced December 2010.