-
Bias in studies of prenatal exposures using real-world data due to pregnancy identification method
Authors:
Chase D. Latour,
Jessie K. Edwards,
Michele Jonsson Funk,
Elizabeth A. Suarez,
Kim Boggess,
Mollie E. Wood
Abstract:
Background: Researchers typically identify pregnancies in healthcare data based on observed outcomes (e.g., delivery). This outcome-based approach misses pregnancies that received prenatal care but whose outcomes were not recorded (e.g., at-home miscarriage), potentially inducing selection bias in effect estimates for prenatal exposures. Alternatively, prenatal encounters can be used to identify p…
▽ More
Background: Researchers typically identify pregnancies in healthcare data based on observed outcomes (e.g., delivery). This outcome-based approach misses pregnancies that received prenatal care but whose outcomes were not recorded (e.g., at-home miscarriage), potentially inducing selection bias in effect estimates for prenatal exposures. Alternatively, prenatal encounters can be used to identify pregnancies, including those with unobserved outcomes. However, this prenatal approach requires methods to address missing data. Methods: We simulated 10,000,000 pregnancies and estimated the total effect of initiating treatment on the risk of preeclampsia. We generated data for 36 scenarios in which we varied the effect of treatment on miscarriage and/or preeclampsia; the percentage with missing outcomes (5% or 20%); and the cause of missingness: (1) measured covariates, (2) unobserved miscarriage, and (3) a mix of both. We then created three analytic samples to address missing pregnancy outcomes: observed deliveries, observed deliveries and miscarriages, and all pregnancies. Treatment effects were estimated using non-parametric direct standardization. Results: Risk differences (RDs) and risk ratios (RRs) from the three analytic samples were similarly biased when all missingness was due to unobserved miscarriage (log-transformed RR bias range: -0.12-0.33 among observed deliveries; -0.11-0.32 among observed deliveries and miscarriages; and -0.11-0.32 among all pregnancies). When predictors of missingness were measured, only the all pregnancies approach was unbiased (-0.27-0.33; -0.29-0.03; and -0.02-0.01, respectively). Conclusions: When all missingness was due to miscarriage, the analytic samples returned similar effect estimates. Only among all pregnancies did bias decrease as the proportion of missingness due to measured variables increased.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
A Bayesian location-scale joint model for time-to-event and multivariate longitudinal data with association based on within-individual variability
Authors:
Marco Palma,
Ruth H Keogh,
Siobhán B Carr,
Rhonda Szczesniak,
David Taylor-Robinson,
Angela M Wood,
Graciela Muniz-Terrera,
Jessica K Barrett
Abstract:
Within-individual variability of health indicators measured over time is becoming commonly used to inform about disease progression. Simple summary statistics (e.g. the standard deviation for each individual) are often used but they are not suited to account for time changes. In addition, when these summary statistics are used as covariates in a regression model for time-to-event outcomes, the est…
▽ More
Within-individual variability of health indicators measured over time is becoming commonly used to inform about disease progression. Simple summary statistics (e.g. the standard deviation for each individual) are often used but they are not suited to account for time changes. In addition, when these summary statistics are used as covariates in a regression model for time-to-event outcomes, the estimates of the hazard ratios are subject to regression dilution. To overcome these issues, a joint model is built where the association between the time-to-event outcome and multivariate longitudinal markers is specified in terms of the within-individual variability of the latter. A mixed-effect location-scale model is used to analyse the longitudinal biomarkers, their within-individual variability and their correlation. The time to event is modelled using a proportional hazard regression model, with a flexible specification of the baseline hazard, and the information from the longitudinal biomarkers is shared as a function of the random effects. The model can be used to quantify within-individual variability for the longitudinal markers and their association with the time-to-event outcome. We show through a simulation study the performance of the model in comparison with the standard joint model with constant variance. The model is applied on a dataset of adult women from the UK cystic fibrosis registry, to evaluate the association between lung function, malnutrition and mortality.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Healthy Live Births Should be Considered as Competing Events when Estimating the Total Effect of Prenatal Medication Use on Pregnancy Outcomes
Authors:
Chase D. Latour,
Mark Klose,
Jessie K. Edwards,
Zoey Song,
Michele Jonsson Funk,
Mollie E. Wood
Abstract:
Pregnancy loss is recognized as an important competing event in studies of prenatal medication use. However, a healthy live birth also precludes subsequent adverse pregnancy outcomes, yet these events are often censored. Using Monte Carlo simulation, we examine bias that results from failure to account for healthy live birth as a competing event in estimates of the total effect of prenatal medicat…
▽ More
Pregnancy loss is recognized as an important competing event in studies of prenatal medication use. However, a healthy live birth also precludes subsequent adverse pregnancy outcomes, yet these events are often censored. Using Monte Carlo simulation, we examine bias that results from failure to account for healthy live birth as a competing event in estimates of the total effect of prenatal medication use on pregnancy outcomes. We simulated data for 12 trials estimating the effect of antihypertensive initiation versus non-initiation on two outcomes: (1) composite fetal death or severe prenatal preeclampsia and (2) small-for-gestational-age (SGA) live birth. We used time-to-event methods to estimate absolute risks, risk differences and risk ratios. For the composite outcome, we conducted two analyses where non-preeclamptic live birth was (1) a censoring event and (2) a competing event. For SGA live birth, we conducted three analyses where fetal death and non-SGA live birth were (1) censoring events, (2) a competing event and censoring event, respectively; and (3) competing events. In all analyses, censoring healthy live births led to inflated absolute risk estimates as well as bias and imprecise treatment effect estimates. Studies of prenatal exposures on pregnancy outcomes should analyze healthy live births as competing risks to estimate unbiased total treatment effects.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions
Authors:
Anuroop Sriram,
Benjamin Kurt Miller,
Ricky T. Q. Chen,
Brandon M. Wood
Abstract:
Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemanni…
▽ More
Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemannian flow matching (RFM) to design novel crystalline materials. FlowLLM first fine-tunes an LLM to learn an effective base distribution of meta-stable crystals in a text representation. After converting to a graph representation, the RFM model takes samples from the LLM and iteratively refines the coordinates and lattice parameters. Our approach significantly outperforms state-of-the-art methods, increasing the generation rate of stable materials by over three times and increasing the rate for stable, unique, and novel crystals by $\sim50\%$ - a huge improvement on a difficult problem. Additionally, the crystals generated by FlowLLM are much closer to their relaxed state when compared with another leading model, significantly reducing post-hoc computational cost.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
FlowMM: Generating Materials with Riemannian Flow Matching
Authors:
Benjamin Kurt Miller,
Ricky T. Q. Chen,
Anuroop Sriram,
Brandon M Wood
Abstract:
Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area ar…
▽ More
Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area are to (a) predict the stable crystal structure of a known composition of elements and (b) propose novel compositions along with their stable structures. We present FlowMM, a pair of generative models that achieve state-of-the-art performance on both tasks while being more efficient and more flexible than competing methods. We generalize Riemannian Flow Matching to suit the symmetries inherent to crystals: translation, rotation, permutation, and periodic boundary conditions. Our framework enables the freedom to choose the flow base distributions, drastically simplifying the problem of learning crystal structures compared with diffusion models. In addition to standard benchmarks, we validate FlowMM's generated structures with quantum chemistry calculations, demonstrating that it is about 3x more efficient, in terms of integration steps, at finding stable materials compared to previous open methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Treatments for pregestational chronic conditions during pregnancy: emulating a target trial with a treatment decision design
Authors:
Mollie E. Wood,
Chase D. Latour,
Lucia C. Petito
Abstract:
As a solution to methodologic challenges inherent to estimating causal effects of exposures in early pregnancy, we suggest emulating a target trial using a treatment decision design, wherein time zero is centered around clinical landmarks where treatment decisions may occur, such as the date of preconception counseling or prenatal care initiation. These ideas are illustrated via protocols for two…
▽ More
As a solution to methodologic challenges inherent to estimating causal effects of exposures in early pregnancy, we suggest emulating a target trial using a treatment decision design, wherein time zero is centered around clinical landmarks where treatment decisions may occur, such as the date of preconception counseling or prenatal care initiation. These ideas are illustrated via protocols for two target trials in large administrative databases, antidepressant use for pre-existing depressive disorder and antihypertensive medication use for mild-to-moderate chronic hypertension. Careful consideration of these issues is critical to the identification of the causal effects of early-pregnancy pharmacotherapies on pregnancy outcomes.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Patient stratification in multi-arm trials: a two-stage procedure with Bayesian profile regression
Authors:
Yuejia Xu,
Angela M. Wood,
Brian D. M. Tom
Abstract:
Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods…
▽ More
Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods developed for this purpose focus on the case with a dichotomous treatment and are not directly applicable to multi-arm trials. In this paper, we consider the problem of patient stratification in multi-arm trial settings and propose a two-stage procedure within the Bayesian nonparametric framework. Specifically, we first use Bayesian additive regression trees (BART) to predict potential outcomes (treatment responses) under different treatment options for each patient, and then we leverage Bayesian profile regression to cluster patients into subgroups according to their baseline characteristics and predicted potential outcomes. We further embed a variable selection procedure into our proposed framework to identify the patient characteristics that actively "drive" the clustering structure. We conduct simulation studies to examine the performance of our proposed method and demonstrate the method by applying it to a UK-based multi-arm blood donation trial, wherein our method uncovers five clinically meaningful donor subgroups.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Sequential Re-estimation Learning of Optimal Individualized Treatment Rules Among Ordinal Treatments with Application to Recommended Intervals Between Blood Donations
Authors:
Yuejia Xu,
Angela M. Wood,
David J. Roberts,
Brian D. M. Tom
Abstract:
Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard do…
▽ More
Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard donor health and quality of blood received. However, the evidence-base for these limits is lacking. Moreover, in England, the blood service is interested in making blood donation both safe and sustainable by integrating multi-marker data from INTERVAL and developing personalized donation strategies. As the three inter-donation interval options in INTERVAL have clear orderings, we propose a sequential re-estimation learning method that effectively incorporates "treatment" orderings when identifying optimal ITRs. Furthermore, we incorporate variable selection into our method for both linear and nonlinear decision rules to handle situations with (noise) covariates irrelevant for decision-making. Simulations demonstrate its superior performance over existing methods that assume multiple nominal treatments by achieving smaller misclassification rates and larger value functions. Application to a much-in-demand donor subgroup shows that the estimated optimal ITR achieves both the highest utilities and largest proportions of donors assigned to the safest inter-donation interval option in INTERVAL.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
Optimal risk-assessment scheduling for primary prevention of cardiovascular disease
Authors:
Francesca Gasperoni,
Christopher H. Jackson,
Angela M. Wood,
Michael J. Sweeting,
Paul J. Newcombe,
David Stevens,
Jessica K. Barrett
Abstract:
In this work, we introduce a personalised and age-specific Net Benefit function, composed of benefits and costs, to recommend optimal timing of risk assessments for cardiovascular disease prevention. We extend the 2-stage landmarking model to estimate patient-specific CVD risk profiles, adjusting for time-varying covariates. We apply our model to data from the Clinical Practice Research Datalink,…
▽ More
In this work, we introduce a personalised and age-specific Net Benefit function, composed of benefits and costs, to recommend optimal timing of risk assessments for cardiovascular disease prevention. We extend the 2-stage landmarking model to estimate patient-specific CVD risk profiles, adjusting for time-varying covariates. We apply our model to data from the Clinical Practice Research Datalink, comprising primary care electronic health records from the UK. We find that people at lower risk could be recommended an optimal risk-assessment interval of 5 years or more. Time-varying risk-factors are required to discriminate between more frequent schedules for higher-risk people.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Structural Forecasting for Short-term Tropical Cyclone Intensity Guidance
Authors:
Trey McNeely,
Pavel Khokhlov,
Niccolo Dalmasso,
Kimberly M. Wood,
Ann B. Lee
Abstract:
Because geostationary satellite (Geo) imagery provides a high temporal resolution window into tropical cyclone (TC) behavior, we investigate the viability of its application to short-term probabilistic forecasts of TC convective structure to subsequently predict TC intensity. Here, we present a prototype model which is trained solely on two inputs: Geo infrared imagery leading up to the synoptic t…
▽ More
Because geostationary satellite (Geo) imagery provides a high temporal resolution window into tropical cyclone (TC) behavior, we investigate the viability of its application to short-term probabilistic forecasts of TC convective structure to subsequently predict TC intensity. Here, we present a prototype model which is trained solely on two inputs: Geo infrared imagery leading up to the synoptic time of interest and intensity estimates up to 6 hours prior to that time. To estimate future TC structure, we compute cloud-top temperature radial profiles from infrared imagery and then simulate the evolution of an ensemble of those profiles over the subsequent 12 hours by applying a Deep Autoregressive Generative Model (PixelSNAIL). To forecast TC intensities at hours 6 and 12, we input operational intensity estimates up to the current time (0 h) and simulated future radial profiles up to +12 h into a ``nowcasting'' convolutional neural network. We limit our inputs to demonstrate the viability of our approach and to enable quantification of value added by the observed and simulated future radial profiles beyond operational intensity estimates alone. Our prototype model achieves a marginally higher error than the National Hurricane Center's official forecasts despite excluding environmental factors, such as vertical wind shear and sea surface temperature. We also demonstrate that it is possible to reasonably predict short-term evolution of TC convective structure via radial profiles from Geo infrared imagery, resulting in interpretable structural forecasts that may be valuable for TC operational guidance.
△ Less
Submitted 8 April, 2023; v1 submitted 31 May, 2022;
originally announced June 2022.
-
Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations
Authors:
Anuroop Sriram,
Abhishek Das,
Brandon M. Wood,
Siddharth Goyal,
C. Lawrence Zitnick
Abstract:
Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between…
▽ More
Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between triplets or quadruplets of atoms, making it challenging to scale these models. In this paper, we introduce Graph Parallelism, a method to distribute input graphs across multiple GPUs, enabling us to train very large GNNs with hundreds of millions or billions of parameters. We empirically evaluate our method by scaling up the number of parameters of the recently proposed DimeNet++ and GemNet models by over an order of magnitude. On the large-scale Open Catalyst 2020 (OC20) dataset, these graph-parallelized models lead to relative improvements of 1) 15% on the force MAE metric for the S2EF task and 2) 21% on the AFbT metric for the IS2RS task, establishing new state-of-the-art results.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Detecting Distributional Differences in Labeled Sequence Data with Application to Tropical Cyclone Satellite Imagery
Authors:
Trey McNeely,
Galen Vincent,
Kimberly M. Wood,
Rafael Izbicki,
Ann B. Lee
Abstract:
Our goal is to quantify whether, and if so how, spatio-temporal patterns in tropical cyclone (TC) satellite imagery signal an upcoming rapid intensity change event. To address this question, we propose a new nonparametric test of association between a time series of images and a series of binary event labels. We ask whether there is a difference in distribution between (dependent but identically d…
▽ More
Our goal is to quantify whether, and if so how, spatio-temporal patterns in tropical cyclone (TC) satellite imagery signal an upcoming rapid intensity change event. To address this question, we propose a new nonparametric test of association between a time series of images and a series of binary event labels. We ask whether there is a difference in distribution between (dependent but identically distributed) 24-h sequences of images preceding an event versus a non-event. By rewriting the statistical test as a regression problem, we leverage neural networks to infer modes of structural evolution of TC convection that are representative of the lead-up to rapid intensity change events. Dependencies between nearby sequences are handled by a bootstrap procedure that estimates the marginal distribution of the label series. We prove that type I error control is guaranteed as long as the distribution of the label series is well-estimated, which is made easier by the extensive historical data for binary TC event labels. We show empirical evidence that our proposed method identifies archetypes of infrared imagery associated with elevated rapid intensification risk, typically marked by deep or deepening core convection over time. Such results provide a foundation for improved forecasts of rapid intensification.
△ Less
Submitted 27 June, 2022; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Identifying Distributional Differences in Convective Evolution Prior to Rapid Intensification in Tropical Cyclones
Authors:
Trey McNeely,
Galen Vincent,
Rafael Izbicki,
Kimberly M. Wood,
Ann B. Lee
Abstract:
Tropical cyclone (TC) intensity forecasts are issued by human forecasters who evaluate spatio-temporal observations (e.g., satellite imagery) and model output (e.g., numerical weather prediction, statistical models) to produce forecasts every 6 hours. Within these time constraints, it can be challenging to draw insight from such data. While high-capacity machine learning methods are well suited fo…
▽ More
Tropical cyclone (TC) intensity forecasts are issued by human forecasters who evaluate spatio-temporal observations (e.g., satellite imagery) and model output (e.g., numerical weather prediction, statistical models) to produce forecasts every 6 hours. Within these time constraints, it can be challenging to draw insight from such data. While high-capacity machine learning methods are well suited for prediction problems with complex sequence data, extracting interpretable scientific information with such methods is difficult. Here we leverage powerful AI prediction algorithms and classical statistical inference to identify patterns in the evolution of TC convective structure leading up to the rapid intensification of a storm, hence providing forecasters and scientists with key insight into TC behavior.
△ Less
Submitted 30 November, 2021; v1 submitted 24 September, 2021;
originally announced September 2021.
-
Spatio-temporal methods for estimating subsurface ocean thermal response to tropical cyclones
Authors:
Addison J. Hu,
Mikael Kuusela,
Ann B. Lee,
Donata Giglio,
Kimberly M. Wood
Abstract:
Tropical cyclones (TCs), driven by heat exchange between the air and sea, pose a substantial risk to many communities around the world. Accurate characterization of the subsurface ocean thermal response to TC passage is crucial for accurate TC intensity forecasts and for an understanding of the role that TCs play in the global climate system. However, that characterization is complicated by the hi…
▽ More
Tropical cyclones (TCs), driven by heat exchange between the air and sea, pose a substantial risk to many communities around the world. Accurate characterization of the subsurface ocean thermal response to TC passage is crucial for accurate TC intensity forecasts and for an understanding of the role that TCs play in the global climate system. However, that characterization is complicated by the high-noise ocean environment, correlations inherent in spatio-temporal data, relative scarcity of in situ observations, and the entanglement of the TC-induced signal with seasonal signals. We present a general methodological framework that addresses these difficulties, integrating existing techniques in seasonal mean field estimation, Gaussian process modeling, and nonparametric regression into an ANOVA decomposition model. Importantly, we improve upon past work by properly handling seasonality, providing rigorous uncertainty quantification, and treating time as a continuous variable, rather than producing estimates that are binned in time. This ANOVA model is estimated using in situ subsurface temperature profiles from the Argo fleet of autonomous floats through a multi-step procedure, which (1) characterizes the upper ocean seasonal shift during the TC season; (2) models the variability in the temperature observations; (3) fits a thin plate spline using the variability estimates to account for heteroskedasticity and correlation between the observations. This spline fit reveals the ocean thermal response to TC passage. Through this framework, we obtain new scientific insights into the interaction between TCs and the ocean on a global scale, including a three-dimensional characterization of the near-surface and subsurface cooling along the TC storm track and the mixing-induced subsurface warming on the track's right side.
△ Less
Submitted 14 March, 2024; v1 submitted 30 December, 2020;
originally announced December 2020.
-
Structural Forecasting for Tropical Cyclone Intensity Prediction: Providing Insight with Deep Learning
Authors:
Trey McNeely,
Niccolò Dalmasso,
Kimberly M. Wood,
Ann B. Lee
Abstract:
Tropical cyclone (TC) intensity forecasts are ultimately issued by human forecasters. The human in-the-loop pipeline requires that any forecasting guidance must be easily digestible by TC experts if it is to be adopted at operational centers like the National Hurricane Center. Our proposed framework leverages deep learning to provide forecasters with something neither end-to-end prediction models…
▽ More
Tropical cyclone (TC) intensity forecasts are ultimately issued by human forecasters. The human in-the-loop pipeline requires that any forecasting guidance must be easily digestible by TC experts if it is to be adopted at operational centers like the National Hurricane Center. Our proposed framework leverages deep learning to provide forecasters with something neither end-to-end prediction models nor traditional intensity guidance does: a powerful tool for monitoring high-dimensional time series of key physically relevant predictors and the means to understand how the predictors relate to one another and to short-term intensity changes.
△ Less
Submitted 7 December, 2020; v1 submitted 7 October, 2020;
originally announced October 2020.
-
A physics-informed operator regression framework for extracting data-driven continuum models
Authors:
Ravi G. Patel,
Nathaniel A. Trask,
Mitchell A. Wood,
Eric C. Cyr
Abstract:
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a…
▽ More
The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a characterization of differential operators while providing structure which may be used to impose biases related to symmetry, isotropy, and conservation form. We demonstrate the effectiveness of our framework for a variety of physics, including local and nonlocal diffusion processes and single and multiphase flows. For the flow physics we demonstrate this approach leads to a learned operator that generalizes to system characteristics not included in the training sets, such as variable particle sizes, densities, and concentration.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
On Bayesian Search for the Feasible Space Under Computationally Expensive Constraints
Authors:
Alma Rahat,
Michael Wood
Abstract:
We are often interested in identifying the feasible subset of a decision space under multiple constraints to permit effective design exploration. If determining feasibility required computationally expensive simulations, the cost of exploration would be prohibitive. Bayesian search is data-efficient for such problems: starting from a small dataset, the central concept is to use Bayesian models of…
▽ More
We are often interested in identifying the feasible subset of a decision space under multiple constraints to permit effective design exploration. If determining feasibility required computationally expensive simulations, the cost of exploration would be prohibitive. Bayesian search is data-efficient for such problems: starting from a small dataset, the central concept is to use Bayesian models of constraints with an acquisition function to locate promising solutions that may improve predictions of feasibility when the dataset is augmented. At the end of this sequential active learning approach with a limited number of expensive evaluations, the models can accurately predict the feasibility of any solution obviating the need for full simulations. In this paper, we propose a novel acquisition function that combines the probability that a solution lies at the boundary between feasible and infeasible spaces (representing exploitation) and the entropy in predictions (representing exploration). Experiments confirmed the efficacy of the proposed function.
△ Less
Submitted 24 June, 2020; v1 submitted 23 April, 2020;
originally announced April 2020.
-
DAG With Omitted Objects Displayed (DAGWOOD): A framework for revealing causal assumptions in DAGs
Authors:
Noah A Haber,
Mollie E Wood,
Sarah Wieten,
Alexander Breskin
Abstract:
Directed acyclic graphs (DAGs) are frequently used in epidemiology as a method to encode causal inference assumptions. We propose the DAGWOOD framework to bring many of those encoded assumptions to the forefront.
DAGWOOD combines a root DAG (the DAG in the proposed analysis) and a set of branch DAGs (alternative hidden assumptions to the root DAG). All branch DAGs share a common ruleset, and mus…
▽ More
Directed acyclic graphs (DAGs) are frequently used in epidemiology as a method to encode causal inference assumptions. We propose the DAGWOOD framework to bring many of those encoded assumptions to the forefront.
DAGWOOD combines a root DAG (the DAG in the proposed analysis) and a set of branch DAGs (alternative hidden assumptions to the root DAG). All branch DAGs share a common ruleset, and must 1) change the root DAG, 2) be a valid DAG, and either 3a) change the minimally sufficient adjustment set or 3b) change the number of frontdoor paths. Branch DAGs comprise a list of assumptions which must be justified as negligible. We define two types of branch DAGs: exclusion branch DAGs add a single- or bidirectional pathway between two nodes in the root DAG (e.g. direct pathways and colliders), while misdirection branch DAGs represent alternative pathways that could be drawn between objects (e.g., creating a collider by reversing the direction of causation for a controlled confounder).
The DAGWOOD framework 1) organizes causal model assumptions, 2) reinforces best DAG practices, 3) provides a framework for evaluation of causal models, and 4) can be used for generating causal models.
△ Less
Submitted 23 November, 2021; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Unlocking GOES: A Statistical Framework for Quantifying the Evolution of Convective Structure in Tropical Cyclones
Authors:
Trey McNeely,
Ann B. Lee,
Kimberly M. Wood,
Dorit Hammerling
Abstract:
Tropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes which drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is criti…
▽ More
Tropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes which drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is critical to monitor these storms. However, these complex data can be challenging to analyze in real time, and off-the-shelf machine learning algorithms have limited applicability on this front due to their ``black box'' structure. This study presents analytic tools that quantify convective structure patterns in infrared satellite imagery for over-ocean TCs, yielding lower-dimensional but rich representations that support analysis and visualization of how these patterns evolve during rapid intensity change. The proposed ORB feature suite targets the global Organization, Radial structure, and Bulk morphology of TCs. By combining ORB and empirical orthogonal functions, we arrive at an interpretable and rich representation of convective structure patterns that serve as inputs to machine learning methods. This study uses the logistic lasso, a penalized generalized linear model, to relate predictors to rapid intensity change. Using ORB alone, binary classifiers identifying the presence (versus absence) of such intensity change events can achieve accuracy comparable to classifiers using environmental predictors alone, with a combined predictor set improving classification accuracy in some settings. More complex nonlinear machine learning methods did not perform better than the linear logistic lasso model for current data.
△ Less
Submitted 3 August, 2020; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models
Authors:
Tra My Pham,
James R Carpenter,
Tim P Morris,
Angela M Wood,
Irene Petersen
Abstract:
Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random (MNAR) mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given dataset, its corresponding popula…
▽ More
Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random (MNAR) mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given dataset, its corresponding population marginal distribution might also be available in an external data source. We show how this information can be readily utilised in the imputation model to calibrate inference to the population, by incorporating an appropriately calculated offset termed the `calibrated-$δ$ adjustment'. We describe the derivation of this offset from the population distribution of the incomplete variable and show how in applications it can be used to closely (and often exactly) match the post-imputation distribution to the population level. Through analytic and simulation studies, we show that our proposed calibrated-$δ$ adjustment MI method can give the same inference as standard MI when data are MAR, and can produce more accurate inference under two general MNAR missingness mechanisms. The method is used to impute missing ethnicity data in a type 2 diabetes prevalence case study using UK primary care electronic health records, where it results in scientifically relevant changes in inference for non-White ethnic groups compared to standard MI. Calibrated-$δ$ adjustment MI represents a pragmatic approach for utilising available population-level information in a sensitivity analysis to explore potential departure from the MAR assumption.
△ Less
Submitted 4 May, 2018;
originally announced May 2018.
-
How sure are we? Two approaches to statistical inference
Authors:
Michael Wood
Abstract:
Suppose you are told that taking a statin will reduce your risk of a heart attack or stroke by 3% in the next ten years, or that women have better emotional intelligence than men. You may wonder how accurate the 3% is, or how confident we should be about the assertion about women's emotional intelligence, bearing in mind that these conclusions are only based on samples of data? My aim here is to p…
▽ More
Suppose you are told that taking a statin will reduce your risk of a heart attack or stroke by 3% in the next ten years, or that women have better emotional intelligence than men. You may wonder how accurate the 3% is, or how confident we should be about the assertion about women's emotional intelligence, bearing in mind that these conclusions are only based on samples of data? My aim here is to present two statistical approaches to questions like these. Approach 1 is often called null hypothesis testing but I prefer the phrase "baseline hypothesis": this is the standard approach in many areas of inquiry but is fraught with problems. Approach 2 can be viewed as a generalisation of the idea of confidence intervals, or as the application of Bayes' theorem. Unlike Approach 1, Approach 2 provides a tentative estimate of the probability of hypotheses of interest. For both approaches, I explain, from first principles, building only on "common sense" statistical concepts like averages and randomness, both how to derive answers, and the rationale behind the answers. This is achieved by using computer simulation methods (resampling and bootstrapping using a spreadsheet available on the web) which avoid the use of probability distributions (t, normal, etc). Such a minimalist, but reasonably rigorous, analysis is particularly useful in a discipline like statistics which is widely used by people who are not specialists. My intended audience includes both statisticians, and users of statistical methods who are not statistical experts.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Simple Methods for Estimating Confidence Levels, or Tentative Probabilities, for Hypotheses Instead of P Values
Authors:
Michael Wood
Abstract:
In many fields of research null hypothesis significance tests and p values are the accepted way of assessing the degree of certainty with which research results can be extrapolated beyond the sample studied. However, there are very serious concerns about the suitability of p values for this purpose. An alternative approach is to cite confidence intervals for a statistic of interest, but this does…
▽ More
In many fields of research null hypothesis significance tests and p values are the accepted way of assessing the degree of certainty with which research results can be extrapolated beyond the sample studied. However, there are very serious concerns about the suitability of p values for this purpose. An alternative approach is to cite confidence intervals for a statistic of interest, but this does not directly tell readers how certain a hypothesis is. Here, I suggest how the framework used for confidence intervals could easily be extended to derive confidence levels, or "tentative probabilities", for hypotheses. I also outline four quick methods for estimating these. This allows researchers to state their confidence in a hypothesis as a direct probability, instead of circuitously by p values referring to an unstated, hypothetical null hypothesis. The inevitable difficulties of statistical inference mean that these probabilities can only be tentative, but probabilities are the natural way to express uncertainties, so, arguably, researchers using statistical methods have an obligation to estimate how probable their hypotheses are by the best available method. Otherwise misinterpretations will fill the void. Key words: Confidence, Null hypothesis significance test, p value, Statistical inference
△ Less
Submitted 12 January, 2020; v1 submitted 10 February, 2017;
originally announced February 2017.
-
Beyond p values: practical methods for analyzing uncertainty in research
Authors:
Michael Wood
Abstract:
This article explains, and discusses the merits of, three approaches for analyzing the certainty with which statistical results can be extrapolated beyond the data gathered. Sometimes it may be possible to use more than one of these approaches. (1) If there is an exact null hypothesis which is credible and interesting (usually not the case), researchers should cite a p value (significance level),…
▽ More
This article explains, and discusses the merits of, three approaches for analyzing the certainty with which statistical results can be extrapolated beyond the data gathered. Sometimes it may be possible to use more than one of these approaches. (1) If there is an exact null hypothesis which is credible and interesting (usually not the case), researchers should cite a p value (significance level), although jargon is best avoided. (2) If the research result is a numerical value, researchers should cite a confidence interval. (3) If there are one or more hypotheses of interest, it may be possible to adapt the methods used for confidence intervals to derive an "estimated probability" for each. Under certain circumstances these could be interpreted as Bayesian posterior probabilities. These estimated probabilities can easily be worked out from the p values and confidence intervals produced by packages such as SPSS. Estimating probabilities for hypotheses means researchers can give a direct answer to the question "How certain can we be that this hypothesis is right?".
△ Less
Submitted 30 September, 2016;
originally announced September 2016.
-
Bootstrapping Confidence Levels for Hypotheses about Quadratic (U-Shaped) Regression Models
Authors:
Michael Wood
Abstract:
Bootstrapping can produce confidence levels for hypotheses about quadratic regression models - such as whether the U-shape is inverted, and the location of optima. The method has several advantages over conventional methods: it provides more, and clearer, information, and is flexible - it could easily be applied to a wide variety of different types of models. The utility of the method can be enhan…
▽ More
Bootstrapping can produce confidence levels for hypotheses about quadratic regression models - such as whether the U-shape is inverted, and the location of optima. The method has several advantages over conventional methods: it provides more, and clearer, information, and is flexible - it could easily be applied to a wide variety of different types of models. The utility of the method can be enhanced by formulating models with interpretable coefficients, such as the location and value of the optimum. Keywords: Bootstrap resampling; Confidence level; Quadratic model; Regression, U-shape.
△ Less
Submitted 6 July, 2012; v1 submitted 19 December, 2009;
originally announced December 2009.
-
P values, confidence intervals, or confidence levels for hypotheses?
Authors:
Michael Wood
Abstract:
Null hypothesis significance tests and p values are widely used despite very strong arguments against their use in many contexts. Confidence intervals are often recommended as an alternative, but these do not achieve the objective of assessing the credibility of a hypothesis, and the distinction between confidence and probability is an unnecessary confusion. This paper proposes a more straightforw…
▽ More
Null hypothesis significance tests and p values are widely used despite very strong arguments against their use in many contexts. Confidence intervals are often recommended as an alternative, but these do not achieve the objective of assessing the credibility of a hypothesis, and the distinction between confidence and probability is an unnecessary confusion. This paper proposes a more straightforward (probabilistic) definition of confidence, and suggests how the idea can be applied to whatever hypotheses are of interest to researchers. The relative merits of the different approaches are discussed using a series of illustrative examples: usually confidence based approaches seem more transparent and useful, but there are some contexts in which p values may be appropriate. I also suggest some methods for converting results from one format to another. (The attractiveness of the idea of confidence is demonstrated by the widespread persistence of the completely incorrect idea that p=5% is equivalent to 95% confidence in the alternative hypothesis. In this paper I show how p values can be used to derive meaningful confidence statements, and the assumptions underlying the derivation.) Key words: Confidence interval, Confidence level, Hypothesis testing, Null hypothesis significance tests, P value, User friendliness.
△ Less
Submitted 11 February, 2014; v1 submitted 19 December, 2009;
originally announced December 2009.
-
Making statistical methods in management research more useful: some suggestions from a case study
Authors:
Michael Wood
Abstract:
I present a critique of the methods used in a typical paper. This leads to three broad conclusions about the conventional use of statistical methods. First, results are often reported in an unnecessarily obscure manner. Second, the null hypothesis testing paradigm is deeply flawed: estimating the size of effects and citing confidence intervals or levels is usually better. Third, there are several…
▽ More
I present a critique of the methods used in a typical paper. This leads to three broad conclusions about the conventional use of statistical methods. First, results are often reported in an unnecessarily obscure manner. Second, the null hypothesis testing paradigm is deeply flawed: estimating the size of effects and citing confidence intervals or levels is usually better. Third, there are several issues, independent of the particular statistical concepts employed, which limit the value of any statistical approach: e.g. difficulties of generalizing to different contexts, and the weakness of some research in terms of the size of the effects found. The first two of these are easily remedied: I illustrate some of the possibilities by re-analyzing the data from the case study article. The third means that in some contexts a statistical approach may not be worthwhile. My case study is a management paper, but similar problems arise in other social sciences. Keywords: Confidence, Hypothesis testing, Null hypothesis significance tests, Philosophy of statistics, Statistical methods, User-friendliness.
△ Less
Submitted 12 November, 2012; v1 submitted 1 August, 2009;
originally announced August 2009.