Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset
Authors:
Yujie Dai,
Brian Sullivan,
Axel Montout,
Amy Dillon,
Chris Waller,
Peter Acs,
Rachel Denholm,
Philip Williams,
Alastair D Hay,
Raul Santos-Rodriguez,
Andrew Dowsey
Abstract:
The use of machine learning and AI on electronic health records (EHRs) holds substantial potential for clinical insight. However, this approach faces challenges due to data heterogeneity, sparsity, temporal misalignment, and limited labeled outcomes. In this context, we leverage a linked EHR dataset of approximately one million de-identified individuals from Bristol, North Somerset, and South Glou…
▽ More
The use of machine learning and AI on electronic health records (EHRs) holds substantial potential for clinical insight. However, this approach faces challenges due to data heterogeneity, sparsity, temporal misalignment, and limited labeled outcomes. In this context, we leverage a linked EHR dataset of approximately one million de-identified individuals from Bristol, North Somerset, and South Gloucestershire, UK, to characterize urinary tract infections (UTIs). We implemented a data pre-processing and curation pipeline that transforms the raw EHR data into a structured format suitable for developing predictive models focused on data fairness, accountability and transparency. Given the limited availability and biases of ground truth UTI outcomes, we introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines. Pairwise XGBoost models are trained using this framework to differentiate UTI risk categories with explainable AI techniques applied to identify key predictors and support interpretability. Our findings reveal differences in clinical and demographic predictors across risk groups. While this study highlights the potential of AI-driven insights to support UTI clinical decision-making, further investigation of patient sub-strata and extensive validation are needed to ensure robustness and applicability in clinical practice.
△ Less
Submitted 28 February, 2025; v1 submitted 26 November, 2024;
originally announced November 2024.