Skip to main content

Showing 1–18 of 18 results for author: Oermann, E K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.23477  [pdf

    cs.CL

    Evaluating the performance and fragility of large language models on the self-assessment for neurological surgeons

    Authors: Krithik Vishwanath, Anton Alyakin, Mrigayu Ghosh, Jin Vivian Lee, Daniel Alexander Alber, Karl L. Sangwon, Douglas Kondziolka, Eric Karl Oermann

    Abstract: The Congress of Neurological Surgeons Self-Assessment for Neurological Surgeons (CNS-SANS) questions are widely used by neurosurgical residents to prepare for written board examinations. Recently, these questions have also served as benchmarks for evaluating large language models' (LLMs) neurosurgical knowledge. This study aims to assess the performance of state-of-the-art LLMs on neurosurgery boa… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 22 pages, 3 main figures, 3 supplemental figures

  2. arXiv:2505.00307  [pdf, ps, other

    cs.LG

    Gateformer: Advancing Multivariate Time Series Forecasting through Temporal and Variate-Wise Attention with Gated Representations

    Authors: Yu-Hsiang Lan, Eric K. Oermann

    Abstract: There has been a recent surge of interest in time series modeling using the Transformer architecture. However, forecasting multivariate time series with Transformer presents a unique challenge as it requires modeling both temporal (cross-time) and variate (cross-variate) dependencies. While Transformer-based models have gained popularity for their flexibility in capturing both sequential and cross… ▽ More

    Submitted 3 July, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted at ICML Workshop on Foundation Models for Structured Data

  3. arXiv:2504.01201  [pdf, other

    cs.CL cs.AI cs.HC

    Medical large language models are easily distracted

    Authors: Krithik Vishwanath, Anton Alyakin, Daniel Alexander Alber, Jin Vivian Lee, Douglas Kondziolka, Eric Karl Oermann

    Abstract: Large language models (LLMs) have the potential to transform medicine, but real-world clinical scenarios contain extraneous information that can hinder performance. The rise of assistive technologies like ambient dictation, which automatically generates draft notes from live patient encounters, has the potential to introduce additional noise making it crucial to assess the ability of LLM's to filt… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 20 pages, 2 main figures, 6 extended figures

  4. arXiv:2503.13508  [pdf

    cs.CL cs.AI cs.CY

    It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education

    Authors: Shrutika Singh, Anton Alyakin, Daniel Alexander Alber, Jaden Stryker, Ai Phuong S Tong, Karl Sangwon, Nicolas Goff, Mathew de la Paz, Miguel Hernandez-Rovira, Ki Yun Park, Eric Claude Leuthardt, Eric Karl Oermann

    Abstract: The performance of Large Language Models (LLMs) on multiple-choice question (MCQ) benchmarks is frequently cited as proof of their medical capabilities. We hypothesized that LLM performance on medical MCQs may in part be illusory and driven by factors beyond medical content knowledge and reasoning capabilities. To assess this, we created a novel benchmark of free-response questions with paired MCQ… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 14 pages, 5 figures

  5. arXiv:2503.04155  [pdf, other

    cs.CL

    BPQA Dataset: Evaluating How Well Language Models Leverage Blood Pressures to Answer Biomedical Questions

    Authors: Chi Hang, Ruiqi Deng, Lavender Yao Jiang, Zihao Yang, Anton Alyakin, Daniel Alber, Eric Karl Oermann

    Abstract: Clinical measurements such as blood pressures and respiration rates are critical in diagnosing and monitoring patient outcomes. It is an important component of biomedical data, which can be used to train transformer-based language models (LMs) for improving healthcare delivery. It is, however, unclear whether LMs can effectively interpret and use clinical measurements. We investigate two questions… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 9 pages

  6. arXiv:2502.19546  [pdf

    cs.AI cs.CL cs.HC

    Repurposing the scientific literature with vision-language models

    Authors: Anton Alyakin, Jaden Stryker, Daniel Alexander Alber, Karl L. Sangwon, Jin Vivian Lee, Brandon Duderstadt, Akshay Save, David Kurland, Spencer Frome, Shrutika Singh, Jeff Zhang, Eunice Yang, Ki Yun Park, Cordelia Orillac, Aly A. Valliani, Sean Neifert, Albert Liu, Aneek Patel, Christopher Livia, Darryl Lau, Ilya Laufer, Peter A. Rozman, Eveline Teresa Hidalgo, Howard Riina, Rui Feng , et al. (7 additional authors not shown)

    Abstract: Leading vision-language models (VLMs) are trained on general Internet content, overlooking scientific journals' rich, domain-specific knowledge. Training on specialty-specific literature could yield high-performance, task-specific tools, enabling generative AI to match generalist models in specialty publishing, educational, and clinical tasks. We created NeuroPubs, a multimodal dataset of 23,000 N… ▽ More

    Submitted 27 April, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

  7. arXiv:2412.10982  [pdf, ps, other

    cs.AI

    MedG-KRP: Medical Graph Knowledge Representation Probing

    Authors: Gabriel R. Rosenbaum, Lavender Yao Jiang, Ivaxi Sheth, Jaden Stryker, Anton Alyakin, Daniel Alexander Alber, Nicolas K. Goff, Young Joon Fred Kwon, John Markert, Mustafa Nasir-Moin, Jan Moritz Niehues, Karl L. Sangwon, Eunice Yang, Eric Karl Oermann

    Abstract: Large language models (LLMs) have recently emerged as powerful tools, finding many medical applications. LLMs' ability to coalesce vast amounts of information from many sources to generate a response-a process similar to that of a human expert-has led many to see potential in deploying LLMs for clinical use. However, medicine is a setting where accurate reasoning is paramount. Many researchers are… ▽ More

    Submitted 16 December, 2024; v1 submitted 14 December, 2024; originally announced December 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 19 pages

  8. arXiv:2410.09019  [pdf, other

    cs.CL

    MedMobile: A mobile-sized language model with expert-level clinical capabilities

    Authors: Krithik Vishwanath, Jaden Stryker, Anton Alyakin, Daniel Alexander Alber, Eric Karl Oermann

    Abstract: Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale implementation. We introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications. We demonstrate that MedMobile scores 75.7% o… ▽ More

    Submitted 11 October, 2024; originally announced October 2024.

    Comments: 13 pages, 5 figures (2 main, 3 supplementary)

  9. arXiv:2409.13038  [pdf, other

    cs.AI

    HeadCT-ONE: Enabling Granular and Controllable Automated Evaluation of Head CT Radiology Report Generation

    Authors: Julián N. Acosta, Xiaoman Zhang, Siddhant Dogra, Hong-Yu Zhou, Seyedmehdi Payabvash, Guido J. Falcone, Eric K. Oermann, Pranav Rajpurkar

    Abstract: We present Head CT Ontology Normalized Evaluation (HeadCT-ONE), a metric for evaluating head CT report generation through ontology-normalized entity and relation extraction. HeadCT-ONE enhances current information extraction derived metrics (such as RadGraph F1) by implementing entity normalization through domain-specific ontologies, addressing radiological language variability. HeadCT-ONE compare… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  10. arXiv:2408.16245  [pdf, ps, other

    cs.LG q-bio.BM

    Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions

    Authors: Sully F. Chen, Robert J. Steele, Glen M. Hocky, Beakal Lemeneh, Shivanand P. Lad, Eric K. Oermann

    Abstract: The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. To date, most biosequence transformers have been trained on single-omic data-either proteins or nucleic acids and have seen incredible success in downstream tasks in each domain, with particularly noteworthy breakthroughs in protein structural mo… ▽ More

    Submitted 18 June, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

    Comments: 41 pages, 5 figures

  11. arXiv:2408.09621  [pdf, other

    cs.CL

    Refining Packing and Shuffling Strategies for Enhanced Performance in Generative Language Models

    Authors: Yanbing Chen, Ruilin Wang, Zihao Yang, Lavender Yao Jiang, Eric Karl Oermann

    Abstract: Packing and shuffling tokens is a common practice in training auto-regressive language models (LMs) to prevent overfitting and improve efficiency. Typically documents are concatenated to chunks of maximum sequence length (MSL) and then shuffled. However setting the atom size, the length for each data chunk accompanied by random shuffling, to MSL may lead to contextual incoherence due to tokens fro… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 11 pages (include appendix), 26 figures, submitted to ACL ARR Aug 2024

    ACM Class: I.2.7

  12. arXiv:2402.10965  [pdf, other

    cs.CL cs.CY cs.LG

    Generalization in Healthcare AI: Evaluation of a Clinical Large Language Model

    Authors: Salman Rahman, Lavender Yao Jiang, Saadia Gabriel, Yindalon Aphinyanaphongs, Eric Karl Oermann, Rumi Chunara

    Abstract: Advances in large language models (LLMs) provide new opportunities in healthcare for improved patient care, clinical decision-making, and enhancement of physician and administrator workflows. However, the potential of these models importantly depends on their ability to generalize effectively across clinical environments and populations, a challenge often underestimated in early development. To be… ▽ More

    Submitted 24 February, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  13. arXiv:2307.07051  [pdf, other

    cs.CL cs.IR cs.LG

    Making the Most Out of the Limited Context Length: Predictive Power Varies with Clinical Note Type and Note Section

    Authors: Hongyi Zheng, Yixin Zhu, Lavender Yao Jiang, Kyunghyun Cho, Eric Karl Oermann

    Abstract: Recent advances in large language models have led to renewed interest in natural language processing in healthcare using the free text of clinical notes. One distinguishing characteristic of clinical notes is their long time span over multiple long documents. The unique structure of clinical notes creates a new design choice: when the context length for a language model predictor is limited, which… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Our code is publicly available on GitHub (https://github.com/nyuolab/EfficientTransformer)

    Journal ref: Association for Computational Linguistics - Student Research Workshop, 2023, pages 104-108

  14. arXiv:2211.07047  [pdf, other

    cs.CL

    Language Model Classifier Aligns Better with Physician Word Sensitivity than XGBoost on Readmission Prediction

    Authors: Grace Yang, Ming Cao, Lavender Y. Jiang, Xujin C. Liu, Alexander T. M. Cheung, Hannah Weiss, David Kurland, Kyunghyun Cho, Eric K. Oermann

    Abstract: Traditional evaluation metrics for classification in natural language processing such as accuracy and area under the curve fail to differentiate between models with different predictive behaviors despite their similar performance metrics. We introduce sensitivity score, a metric that scrutinizes models' behaviors at the vocabulary level to provide insights into disparities in their decision-making… ▽ More

    Submitted 15 November, 2022; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 13 pages

  15. arXiv:2111.00340  [pdf

    cs.LG cs.CY

    Identifying and mitigating bias in algorithms used to manage patients in a pandemic

    Authors: Yifan Li, Garrett Yoon, Mustafa Nasir-Moin, David Rosenberg, Sean Neifert, Douglas Kondziolka, Eric Karl Oermann

    Abstract: Numerous COVID-19 clinical decision support systems have been developed. However many of these systems do not have the merit for validity due to methodological shortcomings including algorithmic bias. Methods Logistic regression models were created to predict COVID-19 mortality, ventilator status and inpatient status using a real-world dataset consisting of four hospitals in New York City and anal… ▽ More

    Submitted 30 October, 2021; originally announced November 2021.

    Comments: 4 pages, 1 tables

  16. arXiv:2110.11872  [pdf

    cs.LG

    Patient level simulation and reinforcement learning to discover novel strategies for treating ovarian cancer

    Authors: Brian Murphy, Mustafa Nasir-Moin, Grace von Oiste, Viola Chen, Howard A Riina, Douglas Kondziolka, Eric K Oermann

    Abstract: The prognosis for patients with epithelial ovarian cancer remains dismal despite improvements in survival for other cancers. Treatment involves multiple lines of chemotherapy and becomes increasingly heterogeneous after first-line therapy. Reinforcement learning with real-world outcomes data has the potential to identify novel treatment strategies to improve overall survival. We design a reinforce… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

  17. arXiv:2107.00520  [pdf, other

    cs.LG stat.ML

    Out-of-distribution Generalization in the Presence of Nuisance-Induced Spurious Correlations

    Authors: Aahlad Puli, Lily H. Zhang, Eric K. Oermann, Rajesh Ranganath

    Abstract: In many prediction problems, spurious correlations are induced by a changing relationship between the label and a nuisance variable that is also correlated with the covariates. For example, in classifying animals in natural images, the background, which is a nuisance, can predict the type of animal. This nuisance-label relationship does not always hold, and the performance of a model trained under… ▽ More

    Submitted 12 February, 2023; v1 submitted 29 June, 2021; originally announced July 2021.

  18. Confounding variables can degrade generalization performance of radiological deep learning models

    Authors: John R. Zech, Marcus A. Badgeley, Manway Liu, Anthony B. Costa, Joseph J. Titano, Eric K. Oermann

    Abstract: Early results in using convolutional neural networks (CNNs) on x-rays to diagnose disease have been promising, but it has not yet been shown that models trained on x-rays from one hospital or one group of hospitals will work equally well at different hospitals. Before these tools are used for computer-aided diagnosis in real-world clinical settings, we must verify their ability to generalize acros… ▽ More

    Submitted 12 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

    Journal ref: PLoS Med 15(11):e1002683 (2019)