Skip to main content

Showing 1–6 of 6 results for author: Schneeweiss, S

.
  1. arXiv:2504.19467  [pdf

    cs.CL cs.AI

    BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text

    Authors: Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder, Yixing Jiang, Valentina Carducci, Richard Wyss, Rishi J Desai, Emily Alsentzer, Leo Anthony Celi, Adam Rodman, Sebastian Schneeweiss, Jonathan H. Chen, Santiago Romero-Brufau, Kueiyu Joshua Lin, Jie Yang

    Abstract: Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, current evaluations of LLMs in clinical contexts remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world electronic health record (EHR) data. O… ▽ More

    Submitted 30 April, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  2. arXiv:2503.06308  [pdf

    stat.AP

    Adaptive multi-wave sampling for efficient chart validation

    Authors: Georg Hahn, Sebastian Schneeweiss, Shirley Wang

    Abstract: Computable phenotypes are used to characterize patients and identify outcomes in studies conducted using healthcare claims and electronic health record data. Chart review studies establish reference labels against which computable phenotypes are compared to understand their measurement characteristics, the quantity of interest, for instance the positive predictive value. We describe a method to ad… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  3. arXiv:2405.10925  [pdf

    stat.ME cs.AI cs.LG

    High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates

    Authors: Janick Weberpals, Pamela A. Shaw, Kueiyu Joshua Lin, Richard Wyss, Joseph M Plasek, Li Zhou, Kerry Ngan, Thomas DeRamus, Sudha R. Raman, Bradley G. Hammill, Hana Lee, Sengwee Toh, John G. Connolly, Kimberly J. Dandreo, Fang Tian, Wei Liu, Jie Li, José J. Hernández-Muñoz, Sebastian Schneeweiss, Rishi J. Desai

    Abstract: Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  4. arXiv:1706.10029  [pdf, other

    stat.ME stat.CO stat.ML

    Collaborative-controlled LASSO for Constructing Propensity Score-based Estimators in High-Dimensional Data

    Authors: Cheng Ju, Richard Wyss, Jessica M. Franklin, Sebastian Schneeweiss, Jenny Häggström, Mark J. van der Laan

    Abstract: Propensity score (PS) based estimators are increasingly used for causal inference in observational studies. However, model selection for PS estimation in high-dimensional data has received little attention. In these settings, PS models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collabo… ▽ More

    Submitted 30 June, 2017; originally announced June 2017.

  5. arXiv:1703.02237  [pdf, other

    stat.CO stat.ME

    Scalable Collaborative Targeted Learning for High-Dimensional Data

    Authors: Cheng Ju, Susan Gruber, Samuel D. Lendle, Antoine Chambaz, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. van der Laan

    Abstract: Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-var… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

  6. arXiv:1703.02236  [pdf, other

    stat.AP stat.ML

    Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods

    Authors: Cheng Ju, Mary Combs, Samuel D Lendle, Jessica M Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. van der Laan

    Abstract: The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. The SL is not restricted to a single prediction model, but uses the strengths of a variety of learning algorithms to adapt to different database… ▽ More

    Submitted 14 March, 2017; v1 submitted 7 March, 2017; originally announced March 2017.