-
Foundation Model of Electronic Medical Records for Adaptive Risk Estimation
Authors:
Pawel Renc,
Michal K. Grzeszczyk,
Nassim Oufattole,
Deirdre Goode,
Yugang Jia,
Szymon Bieganski,
Matthew B. A. McDermott,
Jaroslaw Was,
Anthony E. Samir,
Jonathan W. Cunningham,
David W. Bates,
Arkadiusz Sitek
Abstract:
The U.S. allocates nearly 18% of its GDP to healthcare but experiences lower life expectancy and higher preventable death rates compared to other high-income nations. Hospitals struggle to predict critical outcomes such as mortality, ICU admission, and prolonged hospital stays. Traditional early warning systems, like NEWS and MEWS, rely on static variables and fixed thresholds, limiting their adap…
▽ More
The U.S. allocates nearly 18% of its GDP to healthcare but experiences lower life expectancy and higher preventable death rates compared to other high-income nations. Hospitals struggle to predict critical outcomes such as mortality, ICU admission, and prolonged hospital stays. Traditional early warning systems, like NEWS and MEWS, rely on static variables and fixed thresholds, limiting their adaptability, accuracy, and personalization. We developed the Enhanced Transformer for Health Outcome Simulation (ETHOS), an AI model that tokenizes patient health timelines (PHTs) from EHRs and uses transformer-based architectures to predict future PHTs. The Adaptive Risk Estimation System (ARES) leverages ETHOS to compute dynamic, personalized risk probabilities for clinician-defined critical events. ARES also features a personalized explainability module highlighting key clinical factors influencing risk estimates. We evaluated ARES on the MIMIC-IV v2.2 dataset in emergency department settings, benchmarking its performance against traditional early warning systems and machine learning models. From 299,721 unique patients, 285,622 PHTs (60% with hospital admissions) were processed, comprising over 357 million tokens. ETHOS outperformed benchmark models in predicting hospital admissions, ICU admissions, and prolonged stays, achieving superior AUC scores. Its risk estimates were robust across demographic subgroups, with calibration curves confirming model reliability. The explainability module provided valuable insights into patient-specific risk factors. ARES, powered by ETHOS, advances predictive healthcare AI by delivering dynamic, real-time, personalized risk estimation with patient-specific explainability. Its adaptability and accuracy offer a transformative tool for clinical decision-making, potentially improving patient outcomes and resource allocation.
△ Less
Submitted 13 March, 2025; v1 submitted 9 February, 2025;
originally announced February 2025.
-
Zero Shot Health Trajectory Prediction Using Transformer
Authors:
Pawel Renc,
Yugang Jia,
Anthony E. Samir,
Jaroslaw Was,
Quanzheng Li,
David W. Bates,
Arkadiusz Sitek
Abstract:
Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare's increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Tim…
▽ More
Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare's increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Timelines (PHTs)-detailed, tokenized records of health events-to predict future health trajectories, leveraging a zero-shot learning approach. ETHOS represents a significant advancement in foundation model development for healthcare analytics, eliminating the need for labeled data and model fine-tuning. Its ability to simulate various treatment pathways and consider patient-specific factors positions ETHOS as a tool for care optimization and addressing biases in healthcare delivery. Future developments will expand ETHOS' capabilities to incorporate a wider range of data types and data sources. Our work demonstrates a pathway toward accelerated AI development and deployment in healthcare.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Gamified Crowdsourcing as a Novel Approach to Lung Ultrasound Dataset Labeling
Authors:
Nicole M Duggan,
Mike Jin,
Maria Alejandra Duran Mendicuti,
Stephen Hallisey,
Denie Bernier,
Lauren A Selame,
Ameneh Asgari-Targhi,
Chanel E Fischetti,
Ruben Lucassen,
Anthony E Samir,
Erik Duhaime+,
Tina Kapur,
Andrew J Goldsmith
Abstract:
Study Objective: Machine learning models have advanced medical image processing and can yield faster, more accurate diagnoses. Despite a wealth of available medical imaging data, high-quality labeled data for model training is lacking. We investigated whether a gamified crowdsourcing platform enhanced with inbuilt quality control metrics can produce lung ultrasound clip labels comparable to those…
▽ More
Study Objective: Machine learning models have advanced medical image processing and can yield faster, more accurate diagnoses. Despite a wealth of available medical imaging data, high-quality labeled data for model training is lacking. We investigated whether a gamified crowdsourcing platform enhanced with inbuilt quality control metrics can produce lung ultrasound clip labels comparable to those from clinical experts.
Methods: 2,384 lung ultrasound clips were retrospectively collected from 203 patients. Six lung ultrasound experts classified 393 of these clips as having no B-lines, one or more discrete B-lines, or confluent B-lines to create two sets of reference standard labels (195 training set clips and 198 test set clips). Sets were respectively used to A) train users on a gamified crowdsourcing platform, and B) compare concordance of the resulting crowd labels to the concordance of individual experts to reference standards.
Results: 99,238 crowdsourced opinions on 2,384 lung ultrasound clips were collected from 426 unique users over 8 days. On the 198 test set clips, mean labeling concordance of individual experts relative to the reference standard was 85.0% +/- 2.0 (SEM), compared to 87.9% crowdsourced label concordance (p=0.15). When individual experts' opinions were compared to reference standard labels created by majority vote excluding their own opinion, crowd concordance was higher than the mean concordance of individual experts to reference standards (87.4% vs. 80.8% +/- 1.6; p<0.001).
Conclusion: Crowdsourced labels for B-line classification via a gamified approach achieved expert-level quality. Scalable, high-quality labeling approaches may facilitate training dataset creation for machine learning model development.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Weakly Supervised Context Encoder using DICOM metadata in Ultrasound Imaging
Authors:
Szu-Yeu Hu,
Shuhang Wang,
Wei-Hung Weng,
JingChao Wang,
XiaoHong Wang,
Arinc Ozturk,
Qian Li,
Viksit Kumar,
Anthony E. Samir
Abstract:
Modern deep learning algorithms geared towards clinical adaption rely on a significant amount of high fidelity labeled data. Low-resource settings pose challenges like acquiring high fidelity data and becomes the bottleneck for developing artificial intelligence applications. Ultrasound images, stored in Digital Imaging and Communication in Medicine (DICOM) format, have additional metadata data co…
▽ More
Modern deep learning algorithms geared towards clinical adaption rely on a significant amount of high fidelity labeled data. Low-resource settings pose challenges like acquiring high fidelity data and becomes the bottleneck for developing artificial intelligence applications. Ultrasound images, stored in Digital Imaging and Communication in Medicine (DICOM) format, have additional metadata data corresponding to ultrasound image parameters and medical exams. In this work, we leverage DICOM metadata from ultrasound images to help learn representations of the ultrasound image. We demonstrate that the proposed method outperforms the non-metadata based approaches across different downstream tasks.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.