Skip to main content

Showing 1–19 of 19 results for author: Goldstein, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2503.07664  [pdf

    q-bio.QM cs.IR cs.LG stat.AP

    Antibiotic Resistance Microbiology Dataset (ARMD): A De-identified Resource for Studying Antimicrobial Resistance Using Electronic Health Records

    Authors: Fateme Nateghi Haredasht, Fatemeh Amrollahi, Manoj Maddali, Nicholas Marshall, Stephen P. Ma, Lauren N. Cooper, Richard J. Medford, Sanjat Kanjilal, Niaz Banaei, Stanley Deresinski, Mary K. Goldstein, Steven M. Asch, Amy Chang, Jonathan H. Chen

    Abstract: The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research into antimicrobial resistance (AMR). ARMD encompasses data from adult patients, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demographic features. Key attributes include organism identification, su… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  2. arXiv:2411.01053  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities

    Authors: Adriel Saporta, Aahlad Puli, Mark Goldstein, Rajesh Ranganath

    Abstract: Contrastive learning methods, such as CLIP, leverage naturally paired data-for example, images and their corresponding text captions-to learn general representations that transfer efficiently to downstream tasks. While such approaches are generally applied to two modalities, domains such as robotics, healthcare, and video need to support many types of data at once. We show that the pairwise applic… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  3. arXiv:2408.15805  [pdf, other

    stat.AP stat.CO

    Investigating Complex HPV Dynamics Using Emulation and History Matching

    Authors: Andrew Iskauskas, Jamie A. Cohen, Danny Scarponi, Ian Vernon, Michael Goldstein, Daniel Klein, Richard G. White, Nicky McCreesh

    Abstract: The study of transmission and progression of human papillomavirus (HPV) is crucial for understanding the incidence of cervical cancers, and has been identified as a priority worldwide. The complexity of the disease necessitates a detailed model of HPV transmission and its progression to cancer; to infer properties of the above we require a careful process that can match to imperfect or incomplete… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 21 pages, 15 figures; submitted to Epidemics

  4. arXiv:2407.07998  [pdf, other

    cs.LG stat.ML

    What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

    Authors: Raghav Singhal, Mark Goldstein, Rajesh Ranganath

    Abstract: Reversing a diffusion process by learning its score forms the heart of diffusion-based generative modeling and for estimating properties of scientific systems. The diffusion processes that are tractable center on linear processes with a Gaussian stationary distribution. This limits the kinds of models that can be built to those that target a Gaussian prior or more generally limits the kinds of pro… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  5. arXiv:2403.13724  [pdf, other

    cs.LG stat.ML

    Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes

    Authors: Yifan Chen, Mark Goldstein, Mengjian Hua, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden

    Abstract: We propose a framework for probabilistic forecasting of dynamical systems based on generative modeling. Given observations of the system state over time, we formulate the forecasting problem as sampling from the conditional distribution of the future system state given its current state. To this end, we leverage the framework of stochastic interpolants, which facilitates the construction of a gene… ▽ More

    Submitted 27 August, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  6. arXiv:2310.03725  [pdf, other

    cs.LG stat.ML

    Stochastic interpolants with data-dependent couplings

    Authors: Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, Eric Vanden-Eijnden

    Abstract: Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how… ▽ More

    Submitted 23 September, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  7. arXiv:2302.07261  [pdf, other

    cs.LG stat.ML

    Where to Diffuse, How to Diffuse, and How to Get Back: Automated Learning for Multivariate Diffusions

    Authors: Raghav Singhal, Mark Goldstein, Rajesh Ranganath

    Abstract: Diffusion-based generative models (DBGMs) perturb data to a target noise distribution and reverse this process to generate samples. The choice of noising process, or inference diffusion process, affects both likelihoods and sample quality. For example, extending the inference process with auxiliary variables leads to improved sample quality. While there are many such multivariate diffusions to exp… ▽ More

    Submitted 3 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

  8. Emulation and History Matching using the hmer Package

    Authors: Andrew Iskauskas, Ian Vernon, Michael Goldstein, Danny Scarponi, Trevelyan J. McKinley, Richard G. White, Nicky McCreesh

    Abstract: Modelling complex real-world situations such as infectious diseases, geological phenomena, and biological processes can present a dilemma: the computer model (referred to as a simulator) needs to be complex enough to capture the dynamics of the system, but each increase in complexity increases the evaluation time of such a simulation, making it difficult to obtain an informative description of par… ▽ More

    Submitted 14 December, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: 47 pages, 16 figures; provisionally accepted for publication in Journal of Statistical Software

  9. arXiv:2208.10759  [pdf, other

    cs.LG stat.ML

    Survival Mixture Density Networks

    Authors: Xintian Han, Mark Goldstein, Rajesh Ranganath

    Abstract: Survival analysis, the art of time-to-event modeling, plays an important role in clinical treatment decisions. Recently, continuous time models built from neural ODEs have been proposed for survival analysis. However, the training of neural ODEs is slow due to the high computational complexity of neural ODE solvers. Here, we propose an efficient alternative for flexible continuous time models, cal… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: Machine Learning for Healthcare 2022

  10. arXiv:2112.00881  [pdf, other

    cs.LG stat.ML

    Learning Invariant Representations with Missing Data

    Authors: Mark Goldstein, Jörn-Henrik Jacobsen, Olina Chau, Adriel Saporta, Aahlad Puli, Rajesh Ranganath, Andrew C. Miller

    Abstract: Spurious correlations allow flexible models to predict well during training but poorly on related test distributions. Recent work has shown that models that satisfy particular independencies involving correlation-inducing \textit{nuisance} variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such a… ▽ More

    Submitted 8 June, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: CLeaR (Causal Learning and Reasoning) 2022

  11. arXiv:2111.08175  [pdf, other

    cs.LG stat.ML

    Inverse-Weighted Survival Games

    Authors: Xintian Han, Mark Goldstein, Aahlad Puli, Thomas Wies, Adler J Perotte, Rajesh Ranganath

    Abstract: Deep models trained through maximum likelihood have achieved state-of-the-art results for survival analysis. Despite this training scheme, practitioners evaluate models under other criteria, such as binary classification losses at a chosen set of time horizons, e.g. Brier score (BS) and Bernoulli log likelihood (BLL). Models trained with maximum likelihood may have poor BS or BLL since maximum lik… ▽ More

    Submitted 31 January, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: Neurips 2021

  12. arXiv:2101.05346  [pdf, other

    cs.LG stat.ML

    X-CAL: Explicit Calibration for Survival Analysis

    Authors: Mark Goldstein, Xintian Han, Aahlad Puli, Adler J. Perotte, Rajesh Ranganath

    Abstract: Survival analysis models the distribution of time until an event of interest, such as discharge from the hospital or admission to the ICU. When a model's predicted number of events within any time interval is similar to the observed number, it is called well-calibrated. A survival model's calibration can be measured using, for instance, distributional calibration (D-CALIBRATION) [Haider et al., 20… ▽ More

    Submitted 13 January, 2021; originally announced January 2021.

  13. arXiv:2008.11813  [pdf

    stat.OT

    The use of multiple models within an organisation

    Authors: Chris J Dent, Michael Goldstein, Andrew Wright, Henry P. Wynn

    Abstract: Organisations, whether in government, industry or commerce, are required to make decisions in a complex and uncertain environment. The way models are used is intimately connected to the way organisations make decisions and the context in which they make them. Typically, in a complex organisation, multiple related models will often be used in support of a decision. For example, engineering models m… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

    Comments: 49 pages. White paper arising from Alan Turing Institute project

  14. arXiv:1906.10991  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Verifying Robustness of Gradient Boosted Models

    Authors: Gil Einziger, Maayan Goldstein, Yaniv Sa'ar, Itai Segall

    Abstract: Gradient boosted models are a fundamental machine learning technique. Robustness to small perturbations of the input is an important quality measure for machine learning models, but the literature lacks a method to prove the robustness of gradient boosted models. This work introduces VeriGB, a tool for quantifying the robustness of gradient boosted models. VeriGB encodes the model and the robustne… ▽ More

    Submitted 26 June, 2019; originally announced June 2019.

  15. arXiv:1711.10982  [pdf, ps, other

    stat.ME

    Bayesian analysis of finite population sampling in multivariate co-exchangeable structures with separable covariance matric

    Authors: Simon C. Shaw, Michael Goldstein

    Abstract: We explore the effect of finite population sampling in design problems with many variables cross-classified in many ways. In particular, we investigate designs where we wish to sample individuals belonging to different groups for which the underlying covariance matrices are separable between groups and variables. We exploit the generalised conditional independence structure of the model to show ho… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: 25 pages

  16. arXiv:1607.06358  [pdf, other

    q-bio.MN q-bio.CB q-bio.QM stat.AP stat.ME

    Bayesian uncertainty analysis for complex systems biology models: emulation, global parameter searches and evaluation of gene functions

    Authors: Ian Vernon, Junli Liu, Michael Goldstein, James Rowe, Jen Topping, Keith Lindsey

    Abstract: Background: Many mathematical models have now been employed across every area of systems biology. These models increasingly involve large numbers of unknown parameters, have complex structure which can result in substantial evaluation time relative to the needs of the analysis, and need to be compared to observed data. The correct analysis of such models usually requires a global parameter search,… ▽ More

    Submitted 12 January, 2018; v1 submitted 21 July, 2016; originally announced July 2016.

    Comments: 26 pages, 13 figures. Version accepted by BMC systems biology

    Journal ref: BMC Systems Biology (2018), 12(1)

  17. arXiv:1512.00969  [pdf, ps, other

    math.ST stat.ME

    Posterior Belief Assessment: Extracting Meaningful Subjective Judgements from Bayesian Analyses with Complex Statistical Models

    Authors: Daniel Williamson, Michael Goldstein

    Abstract: In this paper, we are concerned with attributing meaning to the results of a Bayesian analysis for a problem which is sufficiently complex that we are unable to assert a precise correspondence between the expert probabilistic judgements of the analyst and the particular forms chosen for the prior specification and the likelihood for the analysis. In order to do this, we propose performing a finite… ▽ More

    Submitted 3 December, 2015; originally announced December 2015.

    Comments: Published at http://dx.doi.org/10.1214/15-BA966SI in the Bayesian Analysis (http://projecteuclid.org/euclid.ba) by the International Society of Bayesian Analysis (http://bayesian.org/)

    Report number: VTeX-BA-BA966SI

    Journal ref: Bayesian Analysis 2015, Vol. 10, No. 4, 877-908

  18. Galaxy Formation: Bayesian History Matching for the Observable Universe

    Authors: Ian Vernon, Michael Goldstein, Richard Bower

    Abstract: Cosmologists at the Institute of Computational Cosmology, Durham University, have developed a state of the art model of galaxy formation known as Galform, intended to contribute to our understanding of the formation, growth and subsequent evolution of galaxies in the presence of dark matter. Galform requires the specification of many input parameters and takes a significant time to complete one si… ▽ More

    Submitted 20 May, 2014; originally announced May 2014.

    Comments: Published in at http://dx.doi.org/10.1214/12-STS412 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS412

    Journal ref: Statistical Science 2014, Vol. 29, No. 1, 81-90

  19. arXiv:1302.5714  [pdf, other

    stat.ME

    Bayes linear variance structure learning for inspection of large scale physical systems

    Authors: David Randell, Michael Goldstein, Philip Jonathan

    Abstract: Modelling of inspection data for large scale physical systems is critical to assessment of their integrity. We present a general method for inference about system state and associated model variance structure from spatially distributed time series which are typically short, irregular, incomplete and not directly observable. Bayes linear analysis simplifies parameter estimation and avoids often-unr… ▽ More

    Submitted 22 February, 2013; originally announced February 2013.