-
Exploration, Confirmation, and Replication in the Same Observational Study: A Two Team Cross-Screening Approach to Studying the Effect of Unwanted Pregnancy on Mothers' Later Life Outcomes
Authors:
Samrat Roy,
Marina Bogomolov,
Ruth Heller,
Amy M. Claridge,
Tishra Beeson,
Dylan S. Small
Abstract:
The long term consequences of unwanted pregnancies carried to term on mothers have not been much explored. We use data from the Wisconsin Longitudinal Study (WLS) and propose a novel approach, namely two team cross-screening, to study the possible effects of unwanted pregnancies carried to term on various aspects of mothers' later-life mental health, physical health, economic well-being and life s…
▽ More
The long term consequences of unwanted pregnancies carried to term on mothers have not been much explored. We use data from the Wisconsin Longitudinal Study (WLS) and propose a novel approach, namely two team cross-screening, to study the possible effects of unwanted pregnancies carried to term on various aspects of mothers' later-life mental health, physical health, economic well-being and life satisfaction. Our method, unlike existing approaches to observational studies, enables the investigators to perform exploratory data analysis, confirmatory data analysis and replication in the same study. This is a valuable property when there is only a single data set available with unique strengths to perform exploratory, confirmatory and replication analysis. In two team cross-screening, the investigators split themselves into two teams and the data is split as well according to a meaningful covariate. Each team then performs exploratory data analysis on its part of the data to design an analysis plan for the other part of the data. The complete freedom of the teams in designing the analysis has the potential to generate new unanticipated hypotheses in addition to a prefixed set of hypotheses. Moreover, only the hypotheses that looked promising in the data each team explored are forwarded for analysis (thus alleviating the multiple testing problem). These advantages are demonstrated in our study of the effects of unwanted pregnancies on mothers' later life outcomes.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Protocol for an Observational Study on the Effects of Paternal Alcohol Use Disorder on Children's Later Life Outcomes
Authors:
William Bekerman,
Marina Bogomolov,
Ruth Heller,
Matthew Spivey,
Kevin G. Lynch,
David W. Oslin,
Dylan S. Small
Abstract:
The harmful effects of growing up with a parent with an alcohol use disorder have been closely examined in children and adolescents, and are reported to include mental and physical health problems, interpersonal difficulties, and a worsened risk of future substance use disorders. However, few studies have investigated how these impacts evolve into later life adulthood, leaving the ensuing long-ter…
▽ More
The harmful effects of growing up with a parent with an alcohol use disorder have been closely examined in children and adolescents, and are reported to include mental and physical health problems, interpersonal difficulties, and a worsened risk of future substance use disorders. However, few studies have investigated how these impacts evolve into later life adulthood, leaving the ensuing long-term effects of interest. In this article, we provide the protocol for our observational study of the long-term consequences of growing up with a father who had an alcohol use disorder. We will use data from the Wisconsin Longitudinal Study to examine impacts on long-term economic success, interpersonal relationships, physical, and mental health. To reinforce our findings, we will conduct this investigation on two discrete subpopulations of individuals in our study, allowing us to analyze the replicability of our conclusions. We introduce a novel statistical design, called data turnover, to carry out this analysis. Data turnover allows a single group of statisticians and domain experts to work together to assess the strength of evidence gathered across multiple data splits, while incorporating both qualitative and quantitative findings from data exploration. We delineate our analysis plan using this new method and conclude with a brief discussion of some additional considerations for our study.
△ Less
Submitted 19 December, 2024;
originally announced December 2024.
-
Protocol for an Observational Study on the Effects of Giving Births from Unintended Pregnancies on Later Life Physical and Mental Health
Authors:
Samrat Roy,
Marina Bogomolov,
Ruth Heller,
Amy M. Claridge,
Tishra Beeson,
Dylan S. Small
Abstract:
There has been increasing interest in studying the effect of giving births to unintended pregnancies on later life physical and mental health. In this article, we provide the protocol for our planned observational study on the long-term mental and physical health consequences for mothers who bear children resulting from unintended pregnancies. We aim to use the data from the Wisconsin Longitudinal…
▽ More
There has been increasing interest in studying the effect of giving births to unintended pregnancies on later life physical and mental health. In this article, we provide the protocol for our planned observational study on the long-term mental and physical health consequences for mothers who bear children resulting from unintended pregnancies. We aim to use the data from the Wisconsin Longitudinal Study (WLS) and examine the effect of births from unintended pregnancies on a broad range of outcomes, including mental depression, psychological well-being, physical health, alcohol usage, and economic well-being. To strengthen our causal findings, we plan to address our research questions on two subgroups, Catholics and non-Catholics, and discover the "replicable" outcomes for which the effect of unintended pregnancy is negative (or, positive) in both subgroups. Following the idea of non-random cross-screening, the data will be split according to whether the woman is Catholic or not, and then one part of the data will be used to select the hypotheses and design the corresponding tests for the second part of the data. In past use of cross-screening (automatic cross-screening) there was only one team of investigators that dealt with both parts of the data so that the investigators would need to decide on an analysis plan before looking at the data. In this protocol, we describe plans to carry out a novel flexible cross-screening in which there will be two teams of investigators with access only to one part of data and each team will use their part of the data to decide how to plan the analysis for the second team's data. In addition to the above replicability analysis, we also discuss the plan to test the global null hypothesis that is intended to identify the outcomes which are affected by unintended pregnancy for at least one of the two subgroups of Catholics and non-Catholics.
△ Less
Submitted 30 April, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Replicability Across Multiple Studies
Authors:
Marina Bogomolov,
Ruth Heller
Abstract:
Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus non-replicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of…
▽ More
Meta-analysis is routinely performed in many scientific disciplines. This analysis is attractive since discoveries are possible even when all the individual studies are underpowered. However, the meta-analytic discoveries may be entirely driven by signal in a single study, and thus non-replicable. Although the great majority of meta-analyses carried out to date do not infer on the replicability of their findings, it is possible to do so. We provide a selective overview of analyses that can be carried out towards establishing replicability of the scientific findings. We describe methods for the setting where a single outcome is examined in multiple studies (as is common in systematic reviews of medical interventions), as well as for the setting where multiple studies each examine multiple features (as in genomics applications). We also discuss some of the current shortcomings and future directions.
△ Less
Submitted 8 May, 2023; v1 submitted 2 October, 2022;
originally announced October 2022.
-
Filtering the rejection set while preserving false discovery rate control
Authors:
Eugene Katsevich,
Chiara Sabatti,
Marina Bogomolov
Abstract:
Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies amon…
▽ More
Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the p-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.
△ Less
Submitted 10 April, 2020; v1 submitted 5 September, 2018;
originally announced September 2018.
-
Testing hypotheses on a tree: new error rates and controlling strategies
Authors:
Marina Bogomolov,
Christine B. Peterson,
Yoav Benjamini,
Chiara Sabatti
Abstract:
We introduce a multiple testing procedure (TreeBH) which addresses the challenge of controlling error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses which are organized hierarchically in a tree structure. We describe a fast algorithm for the proposed sequential procedure, and prove that it controls relevant error rates given certain assum…
▽ More
We introduce a multiple testing procedure (TreeBH) which addresses the challenge of controlling error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses which are organized hierarchically in a tree structure. We describe a fast algorithm for the proposed sequential procedure, and prove that it controls relevant error rates given certain assumptions on the dependence among the p-values. Through simulations, we demonstrate that TreeBH offers the desired guarantees under a range of dependency structures (including one similar to that encountered in genome-wide association studies) and that it has the potential of gaining power over alternative methods. We also introduce a modified version of TreeBH which we prove to control the relevant error rates under any dependency structure.
We conclude with two case studies: we first analyze data collected as part of the Genotype-Tissue Expression (GTEx) project, which aims to characterize the genetic regulation of gene expression across multiple tissues in the human body, and secondly, data examining the relationship between the gut microbiome and colorectal cancer.
△ Less
Submitted 23 October, 2018; v1 submitted 21 May, 2017;
originally announced May 2017.
-
Many Phenotypes without Many False Discoveries: Error Controlling Strategies for Multi-Traits Association Studies
Authors:
Christine Peterson,
Marina Bogomolov,
Yoav Benjamini,
Chiara Sabatti
Abstract:
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and False Discovery Rate (FD…
▽ More
The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and False Discovery Rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes.
We show that applying FDR-controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the average rate of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure which allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants which impact flowering phenotypes in Arabdopsis thaliana, expanding the set of discoveries.
△ Less
Submitted 2 April, 2015;
originally announced April 2015.
-
Assessing replicability of findings across two studies of multiple features
Authors:
Marina Bogomolov,
Ruth Heller
Abstract:
Replicability analysis aims to identify the findings that replicated across independent studies that examine the same features. We provide powerful novel replicability analysis procedures for two studies for FWER and for FDR control on the replicability claims. The suggested procedures first select the promising features from each study solely based on that study, and then test for replicability o…
▽ More
Replicability analysis aims to identify the findings that replicated across independent studies that examine the same features. We provide powerful novel replicability analysis procedures for two studies for FWER and for FDR control on the replicability claims. The suggested procedures first select the promising features from each study solely based on that study, and then test for replicability only the features that were selected in both studies. We incorporate the plug-in estimates of the fraction of null hypotheses in one study among the selected hypotheses by the other study. Since the fraction of nulls in one study among the selected features from the other study is typically small, the power gain can be remarkable. We provide theoretical guarantees for the control of the appropriate error rates, as well as simulations that demonstrate the excellent power properties of the suggested procedures. We demonstrate the usefulness of our procedures on real data examples from two application fields: behavioural genetics and microarray studies.
△ Less
Submitted 2 April, 2015;
originally announced April 2015.
-
Testing for replicability in a follow-up study when the primary study hypotheses are two-sided
Authors:
Ruth Heller,
Marina Bogomolov,
Yoav Benjamini,
Tamar Sofer
Abstract:
When testing for replication of results from a primary study with two-sided hypotheses in a follow-up study, we are usually interested in discovering the features with discoveries in the same direction in the two studies. The direction of testing in the follow-up study for each feature can therefore be decided by the primary study. We prove that in this case the methods suggested in Heller, Bogomo…
▽ More
When testing for replication of results from a primary study with two-sided hypotheses in a follow-up study, we are usually interested in discovering the features with discoveries in the same direction in the two studies. The direction of testing in the follow-up study for each feature can therefore be decided by the primary study. We prove that in this case the methods suggested in Heller, Bogomolov, and Benjamini (2014) for control over false replicability claims are valid. Specifically, we prove that if we input into the procedures in Heller, Bogomolov, and Benjamini (2014) the one-sided p-values in the directions favoured by the primary study, then we achieve directional control over the desired error measure (family-wise error rate or false discovery rate).
△ Less
Submitted 8 March, 2015;
originally announced March 2015.
-
Deciding whether follow-up studies have replicated findings in a preliminary large-scale "omics' study"
Authors:
Ruth Heller,
Marina Bogomolov,
Yoav Benjamini
Abstract:
We propose a formal method to declare that findings from a primary study have been replicated in a follow-up study. Our proposal is appropriate for primary studies that involve large-scale searches for rare true positives (i.e. needles in a haystack). Our proposal assigns an $r$-value to each finding; this is the lowest false discovery rate at which the finding can be called replicated. Examples a…
▽ More
We propose a formal method to declare that findings from a primary study have been replicated in a follow-up study. Our proposal is appropriate for primary studies that involve large-scale searches for rare true positives (i.e. needles in a haystack). Our proposal assigns an $r$-value to each finding; this is the lowest false discovery rate at which the finding can be called replicated. Examples are given and software is available.
△ Less
Submitted 10 June, 2014; v1 submitted 2 October, 2013;
originally announced October 2013.
-
Discovering findings that replicate from a primary study of high dimension to a follow-up study
Authors:
Marina Bogomolov,
Ruth Heller
Abstract:
We consider the problem of identifying whether findings replicate from one study of high dimension to another, when the primary study guides the selection of hypotheses to be examined in the follow-up study as well as when there is no division of roles into the primary and the follow-up study. We show that existing meta-analysis methods are not appropriate for this problem, and suggest novel metho…
▽ More
We consider the problem of identifying whether findings replicate from one study of high dimension to another, when the primary study guides the selection of hypotheses to be examined in the follow-up study as well as when there is no division of roles into the primary and the follow-up study. We show that existing meta-analysis methods are not appropriate for this problem, and suggest novel methods instead. We prove that our multiple testing procedures control for appropriate error-rates. The suggested FWER controlling procedure is valid for arbitrary dependence among the test statistics within each study. A more powerful procedure is suggested for FDR control. We prove that this procedure controls the FDR if the test statistics are independent within the primary study, and independent or have dependence of type PRDS in the follow-up study. For arbitrary dependence within the primary study, and either arbitrary dependence or dependence of type PRDS in the follow-up study, simple conservative modifications of the procedure control the FDR. We demonstrate the usefulness of these procedures via simulations and real data examples.
△ Less
Submitted 24 May, 2013; v1 submitted 1 July, 2012;
originally announced July 2012.