-
A Multi-Phase Analysis of Blood Culture Stewardship: Machine Learning Prediction, Expert Recommendation Assessment, and LLM Automation
Authors:
Fatemeh Amrollahi,
Nicholas Marshall,
Fateme Nateghi Haredasht,
Kameron C Black,
Aydin Zahedivash,
Manoj V Maddali,
Stephen P. Ma,
Amy Chang,
MD Phar Stanley C Deresinski,
Mary Kane Goldstein,
Steven M. Asch,
Niaz Banaei,
Jonathan H Chen
Abstract:
Blood cultures are often over ordered without clear justification, straining healthcare resources and contributing to inappropriate antibiotic use pressures worsened by the global shortage. In study of 135483 emergency department (ED) blood culture orders, we developed machine learning (ML) models to predict the risk of bacteremia using structured electronic health record (EHR) data and provider n…
▽ More
Blood cultures are often over ordered without clear justification, straining healthcare resources and contributing to inappropriate antibiotic use pressures worsened by the global shortage. In study of 135483 emergency department (ED) blood culture orders, we developed machine learning (ML) models to predict the risk of bacteremia using structured electronic health record (EHR) data and provider notes via a large language model (LLM). The structured models AUC improved from 0.76 to 0.79 with note embeddings and reached 0.81 with added diagnosis codes. Compared to an expert recommendation framework applied by human reviewers and an LLM-based pipeline, our ML approach offered higher specificity without compromising sensitivity. The recommendation framework achieved sensitivity 86%, specificity 57%, while the LLM maintained high sensitivity (96%) but over classified negatives, reducing specificity (16%). These findings demonstrate that ML models integrating structured and unstructured data can outperform consensus recommendations, enhancing diagnostic stewardship beyond existing standards of care.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Antibiotic Resistance Microbiology Dataset (ARMD): A De-identified Resource for Studying Antimicrobial Resistance Using Electronic Health Records
Authors:
Fateme Nateghi Haredasht,
Fatemeh Amrollahi,
Manoj Maddali,
Nicholas Marshall,
Stephen P. Ma,
Lauren N. Cooper,
Richard J. Medford,
Sanjat Kanjilal,
Niaz Banaei,
Stanley Deresinski,
Mary K. Goldstein,
Steven M. Asch,
Amy Chang,
Jonathan H. Chen
Abstract:
The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research into antimicrobial resistance (AMR). ARMD encompasses data from adult patients, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demographic features. Key attributes include organism identification, su…
▽ More
The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research into antimicrobial resistance (AMR). ARMD encompasses data from adult patients, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demographic features. Key attributes include organism identification, susceptibility patterns for 55 antibiotics, implied susceptibility rules, and de-identified patient information. This dataset supports studies on antimicrobial stewardship, causal inference, and clinical decision-making. ARMD is designed to be reusable and interoperable, promoting collaboration and innovation in combating AMR. This paper describes the dataset's acquisition, structure, and utility while detailing its de-identification process.
△ Less
Submitted 8 March, 2025;
originally announced March 2025.
-
Can Artificial Intelligence Generate Quality Research Topics Reflecting Patient Concerns?
Authors:
Jiyeong Kim,
Michael L. Chen,
Shawheen J. Rezaei,
Mariana Ramirez-Posada,
Jennifer L. Caswell-Jin,
Allison W. Kurian,
Fauzia Riaz,
Kavita Y. Sarin,
Jean Y. Tang,
Steven M. Asch,
Eleni Linos
Abstract:
Patient-centered research is increasingly important in narrowing the gap between research and patient care, yet incorporating patient perspectives into health research has been inconsistent. We propose an automated framework leveraging innovative natural language processing (NLP) and artificial intelligence (AI) with patient portal messages to generate research ideas that prioritize important pati…
▽ More
Patient-centered research is increasingly important in narrowing the gap between research and patient care, yet incorporating patient perspectives into health research has been inconsistent. We propose an automated framework leveraging innovative natural language processing (NLP) and artificial intelligence (AI) with patient portal messages to generate research ideas that prioritize important patient issues. We further quantified the quality of AI-generated research topics. To define patient clinical concerns, we analyzed 614,464 patient messages from 25,549 individuals with breast or skin cancer obtained from a large academic hospital (2013 to 2024), constructing a 2-staged unsupervised NLP topic model. Then, we generated research topics to resolve the defined issues using a widely used AI (ChatGPT-4o, OpenAI Inc, April 2024 version) with prompt-engineering strategies. We guided AI to perform multi-level tasks: 1) knowledge interpretation and summarization (e.g., interpreting and summarizing the NLP-defined topics), 2) knowledge generation (e.g., generating research ideas corresponding to patients issues), 3) self-reflection and correction (e.g., ensuring and revising the research ideas after searching for scientific articles), and 4) self-reassurance (e.g., confirming and finalizing the research ideas). Six highly experienced breast oncologists and dermatologists assessed the significance and novelty of AI-generated research topics using a 5-point Likert scale (1-exceptional, 5-poor). One-third of the AI-suggested research topics were highly significant and novel when both scores were lower than the average. Two-thirds of the AI-suggested topics were novel in both cancers. Our findings demonstrate that AI-generated research topics reflecting patient perspectives via a large volume of patient messages can meaningfully guide future directions in patient-centered health research.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Cybercosm: New Foundations for a Converged Science Data Ecosystem
Authors:
Mark Asch,
François Bodin,
Micah Beck,
Terry Moore,
Michela Taufer,
Martin Swany,
Jean-Pierre Vilotte
Abstract:
Scientific communities naturally tend to organize around data ecosystems created by the combination of their observational devices, their data repositories, and the workflows essential to carry their research from observation to discovery. However, these legacy data ecosystems are now breaking down under the pressure of the exponential growth in the volume and velocity of these workflows, which ar…
▽ More
Scientific communities naturally tend to organize around data ecosystems created by the combination of their observational devices, their data repositories, and the workflows essential to carry their research from observation to discovery. However, these legacy data ecosystems are now breaking down under the pressure of the exponential growth in the volume and velocity of these workflows, which are further complicated by the need to integrate the highly data intensive methods of the Artificial Intelligence revolution. Enabling ground breaking science that makes full use of this new, data saturated research environment will require distributed systems that support dramatically improved resource sharing, workflow portability and composability, and data ecosystem convergence.
The Cybercosm vision presented in this white paper describes a radically different approach to the architecture of distributed systems for data-intensive science and its application workflows. As opposed to traditional models that restrict interoperability by hiving off storage, networking, and computing resources in separate technology silos, Cybercosm defines a minimally sufficient hypervisor as a spanning layer for its data plane that virtualizes and converges the local resources of the system's nodes in a fully interoperable manner. By building on a common, universal interface into which the problems that infect today's data-intensive workflows can be decomposed and attacked, Cybercosm aims to support scalable, portable and composable workflows that span and merge the distributed data ecosystems that characterize leading edge research communities today.
△ Less
Submitted 29 June, 2021; v1 submitted 22 May, 2021;
originally announced May 2021.