-
Developing and Evaluating an AI-Assisted Prediction Model for Unplanned Intensive Care Admissions following Elective Neurosurgery using Natural Language Processing within an Electronic Healthcare Record System
Authors:
Julia Ive,
Olatomiwa Olukoya,
Jonathan P. Funnell,
James Booker,
Sze H M Lam,
Ugan Reddy,
Kawsar Noor,
Richard JB Dobson,
Astri M. V. Luoma,
Hani J Marcus
Abstract:
Introduction: Timely care in a specialised neuro-intensive therapy unit (ITU) reduces mortality and hospital stays, with planned admissions being safer than unplanned ones. However, post-operative care decisions remain subjective. This study used artificial intelligence (AI), specifically natural language processing (NLP) to analyse electronic health records (EHRs) and predict ITU admissions for e…
▽ More
Introduction: Timely care in a specialised neuro-intensive therapy unit (ITU) reduces mortality and hospital stays, with planned admissions being safer than unplanned ones. However, post-operative care decisions remain subjective. This study used artificial intelligence (AI), specifically natural language processing (NLP) to analyse electronic health records (EHRs) and predict ITU admissions for elective surgery patients. Methods: This study analysed the EHRs of elective neurosurgery patients from University College London Hospital (UCLH) using NLP. Patients were categorised into planned high dependency unit (HDU) or ITU admission; unplanned HDU or ITU admission; or ward / overnight recovery (ONR). The Medical Concept Annotation Tool (MedCAT) was used to identify SNOMED-CT concepts within the clinical notes. We then explored the utility of these identified concepts for a range of AI algorithms trained to predict ITU admission. Results: The CogStack-MedCAT NLP model, initially trained on hospital-wide EHRs, underwent two refinements: first with data from patients with Normal Pressure Hydrocephalus (NPH) and then with data from Vestibular Schwannoma (VS) patients, achieving a concept detection F1-score of 0.93. This refined model was then used to extract concepts from EHR notes of 2,268 eligible neurosurgical patients. We integrated the extracted concepts into AI models, including a decision tree model and a neural time-series model. Using the simpler decision tree model, we achieved a recall of 0.87 (CI 0.82 - 0.91) for ITU admissions, reducing the proportion of unplanned ITU cases missed by human experts from 36% to 4%. Conclusion: The NLP model, refined for accuracy, has proven its efficiency in extracting relevant concepts, providing a reliable basis for predictive AI models to use in clinically valid applications.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Validating transformers for redaction of text from electronic health records in real-world healthcare
Authors:
Zeljko Kraljevic,
Anthony Shek,
Joshua Au Yeung,
Ewart Jonathan Sheldon,
Mohammad Al-Agil,
Haris Shuaib,
Xi Bai,
Kawsar Noor,
Anoop D. Shah,
Richard Dobson,
James Teo
Abstract:
Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep…
▽ More
Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep learning techniques have emerged as a promising solution, but implementing them in real-world environments poses challenges due to the differences in patient record structure and language across different departments, hospitals, and countries.
In this study, we present AnonCAT, a transformer-based model and a blueprint on how deidentification models can be deployed in real-world healthcare. AnonCAT was trained through a process involving manually annotated redactions of real-world documents from three UK hospitals with different electronic health record systems and 3116 documents. The model achieved high performance in all three hospitals with a Recall of 0.99, 0.99 and 0.96.
Our findings demonstrate the potential of deep learning techniques for improving the efficiency and accuracy of redaction in global healthcare data and highlight the importance of building workflows which not just use these models but are also able to continually fine-tune and audit the performance of these algorithms to ensure continuing effectiveness in real-world settings. This approach provides a blueprint for the real-world use of de-identifying algorithms through fine-tuning and localisation, the code together with tutorials is available on GitHub (https://github.com/CogStack/MedCAT).
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
A Capsule Network for Hierarchical Multi-Label Image Classification
Authors:
Khondaker Tasrif Noor,
Antonio Robles-Kelly,
Brano Kusy
Abstract:
Image classification is one of the most important areas in computer vision. Hierarchical multi-label classification applies when a multi-class image classification problem is arranged into smaller ones based upon a hierarchy or taxonomy. Thus, hierarchical classification modes generally provide multiple class predictions on each instance, whereby these are expected to reflect the structure of imag…
▽ More
Image classification is one of the most important areas in computer vision. Hierarchical multi-label classification applies when a multi-class image classification problem is arranged into smaller ones based upon a hierarchy or taxonomy. Thus, hierarchical classification modes generally provide multiple class predictions on each instance, whereby these are expected to reflect the structure of image classes as related to one another. In this paper, we propose a multi-label capsule network (ML-CapsNet) for hierarchical classification. Our ML-CapsNet predicts multiple image classes based on a hierarchical class-label tree structure. To this end, we present a loss function that takes into account the multi-label predictions of the network. As a result, the training approach for our ML-CapsNet uses a coarse to fine paradigm while maintaining consistency with the structure in the classification levels in the label-hierarchy. We also perform experiments using widely available datasets and compare the model with alternatives elsewhere in the literature. In our experiments, our ML-CapsNet yields a margin of improvement with respect to these alternative methods.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Framework for Behavioral Disorder Detection Using Machine Learning and Application of Virtual Cognitive Behavioral Therapy in COVID-19 Pandemic
Authors:
Tasnim Niger,
Hasanur Rayhan,
Rashidul Islam,
Kazi Asif Abdullah Noor,
Kamrul Hasan
Abstract:
In this modern world, people are becoming more self-centered and unsocial. On the other hand, people are stressed, becoming more anxious during COVID-19 pandemic situation and exhibits symptoms of behavioral disorder. To measure the symptoms of behavioral disorder, usually psychiatrist use long hour sessions and inputs from specific questionnaire. This process is time consuming and sometime is ine…
▽ More
In this modern world, people are becoming more self-centered and unsocial. On the other hand, people are stressed, becoming more anxious during COVID-19 pandemic situation and exhibits symptoms of behavioral disorder. To measure the symptoms of behavioral disorder, usually psychiatrist use long hour sessions and inputs from specific questionnaire. This process is time consuming and sometime is ineffective to detect the right behavioral disorder. Also, reserved people sometime hesitate to follow this process. We have created a digital framework which can detect behavioral disorder and prescribe virtual Cognitive Behavioral Therapy (vCBT) for recovery. By using this framework people can input required data that are highly responsible for the three behavioral disorders namely depression, anxiety and internet addiction. We have applied machine learning technique to detect specific behavioral disorder from samples. This system guides the user with basic understanding and treatment through vCBT from anywhere any time which would potentially be the steppingstone for the user to be conscious and pursue right treatment.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Predicting Clinical Intent from Free Text Electronic Health Records
Authors:
Kawsar Noor,
Katherine Smith,
Julia Bennett,
Jade OConnell,
Jessica Fisk,
Monika Hunt,
Gary Philippo,
Teresa Xu,
Simon Knight,
Luis Romao,
Richard JB Dobson,
Wai Keong Wong
Abstract:
After a patient consultation, a clinician determines the steps in the management of the patient. A clinician may for example request to see the patient again or refer them to a specialist. Whilst most clinicians will record their intent as "next steps" in the patient's clinical notes, in some cases the clinician may forget to indicate their intent as an order or request, e.g. failure to place the…
▽ More
After a patient consultation, a clinician determines the steps in the management of the patient. A clinician may for example request to see the patient again or refer them to a specialist. Whilst most clinicians will record their intent as "next steps" in the patient's clinical notes, in some cases the clinician may forget to indicate their intent as an order or request, e.g. failure to place the follow-up order. This consequently results in patients becoming lost-to-follow up and may in some cases lead to adverse consequences. In this paper we train a machine learning model to detect a clinician's intent to follow up with a patient from the patient's clinical notes. Annotators systematically identified 22 possible types of clinical intent and annotated 3000 Bariatric clinical notes. The annotation process revealed a class imbalance in the labeled data and we found that there was only sufficient labeled data to train 11 out of the 22 intents. We used the data to train a BERT based multilabel classification model and reported the following average accuracy metrics for all intents: macro-precision: 0.91, macro-recall: 0.90, macro-f1: 0.90.
△ Less
Submitted 25 March, 2022;
originally announced April 2022.
-
Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals
Authors:
Kawsar Noor,
Lukasz Roguski,
Alex Handy,
Roman Klapaukh,
Amos Folarin,
Luis Romao,
Joshua Matteson,
Nathan Lea,
Leilei Zhu,
Wai Keong Wong,
Anoop Shah,
Richard J Dobson
Abstract:
As more healthcare organisations transition to using electronic health record (EHR) systems it is important for these organisations to maximise the secondary use of their data to support service improvement and clinical research. These organisations will find it challenging to have systems which can mine information from the unstructured data fields in the record (clinical notes, letters etc) and…
▽ More
As more healthcare organisations transition to using electronic health record (EHR) systems it is important for these organisations to maximise the secondary use of their data to support service improvement and clinical research. These organisations will find it challenging to have systems which can mine information from the unstructured data fields in the record (clinical notes, letters etc) and more practically have such systems interact with all of the hospitals data systems (legacy and current). To tackle this problem at University College London Hospitals, we have deployed an enhanced version of the CogStack platform; an information retrieval platform with natural language processing capabilities which we have configured to process the hospital's existing and legacy records. The platform has improved data ingestion capabilities as well as better tools for natural language processing. To date we have processed over 18 million records and the insights produced from CogStack have informed a number of clinical research use cases at the hospitals.
△ Less
Submitted 15 August, 2021;
originally announced August 2021.
-
Explainable AI (XAI) for PHM of Industrial Asset: A State-of-The-Art, PRISMA-Compliant Systematic Review
Authors:
Ahmad Kamal Bin Mohd Nor,
Srinivasa Rao Pedapait,
Masdi Muhammad
Abstract:
A state-of-the-art systematic review on XAI applied to Prognostic and Health Management (PHM) of industrial asset is presented. This work provides an overview of the general trend of XAI in PHM, answers the question of accuracy versus explainability, the extent of human involvement, the explanation assessment and uncertainty quantification in PHM-XAI domain. Research articles associated with the s…
▽ More
A state-of-the-art systematic review on XAI applied to Prognostic and Health Management (PHM) of industrial asset is presented. This work provides an overview of the general trend of XAI in PHM, answers the question of accuracy versus explainability, the extent of human involvement, the explanation assessment and uncertainty quantification in PHM-XAI domain. Research articles associated with the subject, from 2015 to 2021 were selected from five known databases following PRISMA guidelines. Data was then extracted from the selected articles and examined. Several findings were synthesized. Firstly, while the discipline is still young, the analysis indicated the growing acceptance of XAI in PHM domain. Secondly, XAI functions as a double edge sword, where it is assimilated as a tool to execute PHM tasks as well as a mean of explanation, particularly in diagnostic and anomaly detection activities, implying a real need for XAI in PHM. Thirdly, the review showed that PHM-XAI papers produce either good or excellent result in general, suggesting that PHM performance is unaffected by XAI. Fourthly, human role, evaluation metrics and uncertainty management are areas requiring further attention by the PHM community. Adequate assessment metrics to cater for PHM need are urgently needed.Finally, most case study featured on the accepted articles are based on real, industrial data, indicating that the available PHM-XAI blends are fit to solve complex,real-world challenges, increasing the confidence in AI adoption in the industry.
△ Less
Submitted 5 September, 2021; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Multi-domain Clinical Natural Language Processing with MedCAT: the Medical Concept Annotation Toolkit
Authors:
Zeljko Kraljevic,
Thomas Searle,
Anthony Shek,
Lukasz Roguski,
Kawsar Noor,
Daniel Bean,
Aurelie Mascio,
Leilei Zhu,
Amos A Folarin,
Angus Roberts,
Rebecca Bendayan,
Mark P Richardson,
Robert Stewart,
Anoop D Shah,
Wai Keong Wong,
Zina Ibrahim,
James T Teo,
Richard JB Dobson
Abstract:
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a f…
▽ More
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of Information Extraction (IE) technologies to enable clinical analysis. We present the open-source Medical Concept Annotation Toolkit (MedCAT) that provides: a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; b) a feature-rich annotation interface for customising and training IE models; and c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ~8.8B words from ~17M clinical records and further fine-tuning with ~6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets, and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
△ Less
Submitted 25 March, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.