Skip to main content

Showing 1–26 of 26 results for author: Wood, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2504.12415  [pdf

    stat.AP

    Bias in studies of prenatal exposures using real-world data due to pregnancy identification method

    Authors: Chase D. Latour, Jessie K. Edwards, Michele Jonsson Funk, Elizabeth A. Suarez, Kim Boggess, Mollie E. Wood

    Abstract: Background: Researchers typically identify pregnancies in healthcare data based on observed outcomes (e.g., delivery). This outcome-based approach misses pregnancies that received prenatal care but whose outcomes were not recorded (e.g., at-home miscarriage), potentially inducing selection bias in effect estimates for prenatal exposures. Alternatively, prenatal encounters can be used to identify p… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  2. arXiv:2503.12270  [pdf, other

    stat.ME

    A Bayesian location-scale joint model for time-to-event and multivariate longitudinal data with association based on within-individual variability

    Authors: Marco Palma, Ruth H Keogh, Siobhán B Carr, Rhonda Szczesniak, David Taylor-Robinson, Angela M Wood, Graciela Muniz-Terrera, Jessica K Barrett

    Abstract: Within-individual variability of health indicators measured over time is becoming commonly used to inform about disease progression. Simple summary statistics (e.g. the standard deviation for each individual) are often used but they are not suited to account for time changes. In addition, when these summary statistics are used as covariates in a regression model for time-to-event outcomes, the est… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  3. Healthy Live Births Should be Considered as Competing Events when Estimating the Total Effect of Prenatal Medication Use on Pregnancy Outcomes

    Authors: Chase D. Latour, Mark Klose, Jessie K. Edwards, Zoey Song, Michele Jonsson Funk, Mollie E. Wood

    Abstract: Pregnancy loss is recognized as an important competing event in studies of prenatal medication use. However, a healthy live birth also precludes subsequent adverse pregnancy outcomes, yet these events are often censored. Using Monte Carlo simulation, we examine bias that results from failure to account for healthy live birth as a competing event in estimates of the total effect of prenatal medicat… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: 33 pages, 4 figures,

  4. arXiv:2410.23405  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI stat.ML

    FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions

    Authors: Anuroop Sriram, Benjamin Kurt Miller, Ricky T. Q. Chen, Brandon M. Wood

    Abstract: Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemanni… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Journal ref: NeurIPS 2024

  5. arXiv:2406.04713  [pdf, other

    cs.LG cond-mat.mtrl-sci cs.AI physics.comp-ph stat.ML

    FlowMM: Generating Materials with Riemannian Flow Matching

    Authors: Benjamin Kurt Miller, Ricky T. Q. Chen, Anuroop Sriram, Brandon M Wood

    Abstract: Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area ar… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: https://github.com/facebookresearch/flowmm

    Journal ref: ICML 2024

  6. arXiv:2305.13540  [pdf

    stat.AP stat.ME

    Treatments for pregestational chronic conditions during pregnancy: emulating a target trial with a treatment decision design

    Authors: Mollie E. Wood, Chase D. Latour, Lucia C. Petito

    Abstract: As a solution to methodologic challenges inherent to estimating causal effects of exposures in early pregnancy, we suggest emulating a target trial using a treatment decision design, wherein time zero is centered around clinical landmarks where treatment decisions may occur, such as the date of preconception counseling or prenatal care initiation. These ideas are illustrated via protocols for two… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  7. arXiv:2302.11647  [pdf, other

    stat.ME stat.AP

    Patient stratification in multi-arm trials: a two-stage procedure with Bayesian profile regression

    Authors: Yuejia Xu, Angela M. Wood, Brian D. M. Tom

    Abstract: Precision medicine is an emerging field that takes into account individual heterogeneity to inform better clinical practice. In clinical trials, the evaluation of treatment effect heterogeneity is an important component, and recently, many statistical methods have been proposed for stratifying patients into different subgroups based on such heterogeneity. However, the majority of existing methods… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  8. arXiv:2302.11638  [pdf, other

    stat.ME stat.AP

    Sequential Re-estimation Learning of Optimal Individualized Treatment Rules Among Ordinal Treatments with Application to Recommended Intervals Between Blood Donations

    Authors: Yuejia Xu, Angela M. Wood, David J. Roberts, Brian D. M. Tom

    Abstract: Personalized medicine has gained much popularity recently as a way of providing better healthcare by tailoring treatments to suit individuals. Our research, motivated by the UK INTERVAL blood donation trial, focuses on estimating the optimal individualized treatment rule (ITR) in the ordinal treatment-arms setting. Restrictions on minimum lengths between whole blood donations exist to safeguard do… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

  9. arXiv:2302.04992  [pdf, other

    stat.AP

    Optimal risk-assessment scheduling for primary prevention of cardiovascular disease

    Authors: Francesca Gasperoni, Christopher H. Jackson, Angela M. Wood, Michael J. Sweeting, Paul J. Newcombe, David Stevens, Jessica K. Barrett

    Abstract: In this work, we introduce a personalised and age-specific Net Benefit function, composed of benefits and costs, to recommend optimal timing of risk assessments for cardiovascular disease prevention. We extend the 2-stage landmarking model to estimate patient-specific CVD risk profiles, adjusting for time-varying covariates. We apply our model to data from the Clinical Practice Research Datalink,… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  10. Structural Forecasting for Short-term Tropical Cyclone Intensity Guidance

    Authors: Trey McNeely, Pavel Khokhlov, Niccolo Dalmasso, Kimberly M. Wood, Ann B. Lee

    Abstract: Because geostationary satellite (Geo) imagery provides a high temporal resolution window into tropical cyclone (TC) behavior, we investigate the viability of its application to short-term probabilistic forecasts of TC convective structure to subsequently predict TC intensity. Here, we present a prototype model which is trained solely on two inputs: Geo infrared imagery leading up to the synoptic t… ▽ More

    Submitted 8 April, 2023; v1 submitted 31 May, 2022; originally announced June 2022.

  11. arXiv:2203.09697  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations

    Authors: Anuroop Sriram, Abhishek Das, Brandon M. Wood, Siddharth Goyal, C. Lawrence Zitnick

    Abstract: Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

    Comments: ICLR 2022

  12. arXiv:2202.02253  [pdf, other

    stat.AP stat.ME stat.ML

    Detecting Distributional Differences in Labeled Sequence Data with Application to Tropical Cyclone Satellite Imagery

    Authors: Trey McNeely, Galen Vincent, Kimberly M. Wood, Rafael Izbicki, Ann B. Lee

    Abstract: Our goal is to quantify whether, and if so how, spatio-temporal patterns in tropical cyclone (TC) satellite imagery signal an upcoming rapid intensity change event. To address this question, we propose a new nonparametric test of association between a time series of images and a series of binary event labels. We ask whether there is a difference in distribution between (dependent but identically d… ▽ More

    Submitted 27 June, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

    Comments: 27 pages, 11 figures

  13. arXiv:2109.12029  [pdf, other

    stat.ML cs.LG stat.AP

    Identifying Distributional Differences in Convective Evolution Prior to Rapid Intensification in Tropical Cyclones

    Authors: Trey McNeely, Galen Vincent, Rafael Izbicki, Kimberly M. Wood, Ann B. Lee

    Abstract: Tropical cyclone (TC) intensity forecasts are issued by human forecasters who evaluate spatio-temporal observations (e.g., satellite imagery) and model output (e.g., numerical weather prediction, statistical models) to produce forecasts every 6 hours. Within these time constraints, it can be challenging to draw insight from such data. While high-capacity machine learning methods are well suited fo… ▽ More

    Submitted 30 November, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 7 pages, 4 figures, Tackling Climate Change with Machine Learning: workshop at NeurIPS 2021

  14. arXiv:2012.15130  [pdf, other

    stat.AP

    Spatio-temporal methods for estimating subsurface ocean thermal response to tropical cyclones

    Authors: Addison J. Hu, Mikael Kuusela, Ann B. Lee, Donata Giglio, Kimberly M. Wood

    Abstract: Tropical cyclones (TCs), driven by heat exchange between the air and sea, pose a substantial risk to many communities around the world. Accurate characterization of the subsurface ocean thermal response to TC passage is crucial for accurate TC intensity forecasts and for an understanding of the role that TCs play in the global climate system. However, that characterization is complicated by the hi… ▽ More

    Submitted 14 March, 2024; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: 39 pages, 14 figures; supplement and code at https://github.com/huisaddison/tc-ocean-methods

  15. arXiv:2010.05783  [pdf, other

    cs.LG stat.AP

    Structural Forecasting for Tropical Cyclone Intensity Prediction: Providing Insight with Deep Learning

    Authors: Trey McNeely, Niccolò Dalmasso, Kimberly M. Wood, Ann B. Lee

    Abstract: Tropical cyclone (TC) intensity forecasts are ultimately issued by human forecasters. The human in-the-loop pipeline requires that any forecasting guidance must be easily digestible by TC experts if it is to be adopted at operational centers like the National Hurricane Center. Our proposed framework leverages deep learning to provide forecasters with something neither end-to-end prediction models… ▽ More

    Submitted 7 December, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: To appear in the Tackling Climate Change with Machine Learning workshop at NeurIPS 2020 (Proposals Track) 3 pages, 1 figure

  16. arXiv:2009.11992  [pdf, other

    physics.comp-ph cs.LG math.NA stat.ML

    A physics-informed operator regression framework for extracting data-driven continuum models

    Authors: Ravi G. Patel, Nathaniel A. Trask, Mitchell A. Wood, Eric C. Cyr

    Abstract: The application of deep learning toward discovery of data-driven models requires careful application of inductive biases to obtain a description of physics which is both accurate and robust. We present here a framework for discovering continuum models from high fidelity molecular simulation data. Our approach applies a neural network parameterization of governing physics in modal space, allowing a… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

    Comments: 37 pages, 15 figures

  17. arXiv:2004.11055  [pdf, other

    cs.LG cs.NE stat.ML

    On Bayesian Search for the Feasible Space Under Computationally Expensive Constraints

    Authors: Alma Rahat, Michael Wood

    Abstract: We are often interested in identifying the feasible subset of a decision space under multiple constraints to permit effective design exploration. If determining feasibility required computationally expensive simulations, the cost of exploration would be prohibitive. Bayesian search is data-efficient for such problems: starting from a small dataset, the central concept is to use Bayesian models of… ▽ More

    Submitted 24 June, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: Accepted at The Sixth International Conference on Machine Learning, Optimization, and Data Science. Main content 12 pages, a total of 19 pages with supplementary. 3 Figures and 2 tables. Python code for Bayesian search is available at: http://bitbucket.org/arahat/lod-2020

  18. arXiv:2004.04251  [pdf

    stat.ME stat.AP

    DAG With Omitted Objects Displayed (DAGWOOD): A framework for revealing causal assumptions in DAGs

    Authors: Noah A Haber, Mollie E Wood, Sarah Wieten, Alexander Breskin

    Abstract: Directed acyclic graphs (DAGs) are frequently used in epidemiology as a method to encode causal inference assumptions. We propose the DAGWOOD framework to bring many of those encoded assumptions to the forefront. DAGWOOD combines a root DAG (the DAG in the proposed analysis) and a set of branch DAGs (alternative hidden assumptions to the root DAG). All branch DAGs share a common ruleset, and mus… ▽ More

    Submitted 23 November, 2021; v1 submitted 8 April, 2020; originally announced April 2020.

  19. Unlocking GOES: A Statistical Framework for Quantifying the Evolution of Convective Structure in Tropical Cyclones

    Authors: Trey McNeely, Ann B. Lee, Kimberly M. Wood, Dorit Hammerling

    Abstract: Tropical cyclones (TCs) rank among the most costly natural disasters in the United States, and accurate forecasts of track and intensity are critical for emergency response. Intensity guidance has improved steadily but slowly, as processes which drive intensity change are not fully understood. Because most TCs develop far from land-based observing networks, geostationary satellite imagery is criti… ▽ More

    Submitted 3 August, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: 19 pages, 14 figures, Submitted to the Journal of Applied Meteorology and Climatology

    Journal ref: Journal of Applied Meteorology and Climatology 59.10 (2020): 1671-1689

  20. Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models

    Authors: Tra My Pham, James R Carpenter, Tim P Morris, Angela M Wood, Irene Petersen

    Abstract: Multiple imputation (MI) has become popular for analyses with missing data in medical research. The standard implementation of MI is based on the assumption of data being missing at random (MAR). However, for missing data generated by missing not at random (MNAR) mechanisms, MI performed assuming MAR might not be satisfactory. For an incomplete variable in a given dataset, its corresponding popula… ▽ More

    Submitted 4 May, 2018; originally announced May 2018.

  21. arXiv:1803.06214  [pdf

    stat.OT

    How sure are we? Two approaches to statistical inference

    Authors: Michael Wood

    Abstract: Suppose you are told that taking a statin will reduce your risk of a heart attack or stroke by 3% in the next ten years, or that women have better emotional intelligence than men. You may wonder how accurate the 3% is, or how confident we should be about the assertion about women's emotional intelligence, bearing in mind that these conclusions are only based on samples of data? My aim here is to p… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

    Comments: 39 pages, 2 linked spreadsheets

  22. Simple Methods for Estimating Confidence Levels, or Tentative Probabilities, for Hypotheses Instead of P Values

    Authors: Michael Wood

    Abstract: In many fields of research null hypothesis significance tests and p values are the accepted way of assessing the degree of certainty with which research results can be extrapolated beyond the sample studied. However, there are very serious concerns about the suitability of p values for this purpose. An alternative approach is to cite confidence intervals for a statistic of interest, but this does… ▽ More

    Submitted 12 January, 2020; v1 submitted 10 February, 2017; originally announced February 2017.

    Comments: Published in Methodological Innovations, 2019, 1-9. 2 figures, 2 tables, 3 linked spreadsheets which are also at https://soths.wordpress.com/statistics-links/

  23. arXiv:1609.09803  [pdf

    stat.ME

    Beyond p values: practical methods for analyzing uncertainty in research

    Authors: Michael Wood

    Abstract: This article explains, and discusses the merits of, three approaches for analyzing the certainty with which statistical results can be extrapolated beyond the data gathered. Sometimes it may be possible to use more than one of these approaches. (1) If there is an exact null hypothesis which is credible and interesting (usually not the case), researchers should cite a p value (significance level),… ▽ More

    Submitted 30 September, 2016; originally announced September 2016.

    Comments: 18 pages, 5 tables, 2 figures

  24. arXiv:0912.3880  [pdf

    stat.ME stat.CO

    Bootstrapping Confidence Levels for Hypotheses about Quadratic (U-Shaped) Regression Models

    Authors: Michael Wood

    Abstract: Bootstrapping can produce confidence levels for hypotheses about quadratic regression models - such as whether the U-shape is inverted, and the location of optima. The method has several advantages over conventional methods: it provides more, and clearer, information, and is flexible - it could easily be applied to a wide variety of different types of models. The utility of the method can be enhan… ▽ More

    Submitted 6 July, 2012; v1 submitted 19 December, 2009; originally announced December 2009.

    Comments: 9 pages, 2 figures

  25. arXiv:0912.3878  [pdf

    stat.ME

    P values, confidence intervals, or confidence levels for hypotheses?

    Authors: Michael Wood

    Abstract: Null hypothesis significance tests and p values are widely used despite very strong arguments against their use in many contexts. Confidence intervals are often recommended as an alternative, but these do not achieve the objective of assessing the credibility of a hypothesis, and the distinction between confidence and probability is an unnecessary confusion. This paper proposes a more straightforw… ▽ More

    Submitted 11 February, 2014; v1 submitted 19 December, 2009; originally announced December 2009.

    Comments: The essential argument is unchanged from previous versions, but the paper has been largely rewritten, the argument extended, and more examples and background context included. 21 pages, 3 diagrams, 3 tables

  26. arXiv:0908.0067  [pdf

    stat.AP stat.ME

    Making statistical methods in management research more useful: some suggestions from a case study

    Authors: Michael Wood

    Abstract: I present a critique of the methods used in a typical paper. This leads to three broad conclusions about the conventional use of statistical methods. First, results are often reported in an unnecessarily obscure manner. Second, the null hypothesis testing paradigm is deeply flawed: estimating the size of effects and citing confidence intervals or levels is usually better. Third, there are several… ▽ More

    Submitted 12 November, 2012; v1 submitted 1 August, 2009; originally announced August 2009.

    Comments: 27 pages, 2 figures. New version has amended title, revised abstract, and the rest of the paper has been simplified

    Journal ref: Slightly revised version published in Sage Open, vol 3, no 1, 2013