Accelerating Clinical NLP at Scale with a Hybrid Framework with Reduced GPU Demands: A Case Study in Dementia Identification
Authors:
Jianlin Shi,
Qiwei Gan,
Elizabeth Hanchrow,
Annie Bowles,
John Stanley,
Adam P. Bress,
Jordana B. Cohen,
Patrick R. Alba
Abstract:
Clinical natural language processing (NLP) is increasingly in demand in both clinical research and operational practice. However, most of the state-of-the-art solutions are transformers-based and require high computational resources, limiting their accessibility. We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based mode…
▽ More
Clinical natural language processing (NLP) is increasingly in demand in both clinical research and operational practice. However, most of the state-of-the-art solutions are transformers-based and require high computational resources, limiting their accessibility. We propose a hybrid NLP framework that integrates rule-based filtering, a Support Vector Machine (SVM) classifier, and a BERT-based model to improve efficiency while maintaining accuracy. We applied this framework in a dementia identification case study involving 4.9 million veterans with incident hypertension, analyzing 2.1 billion clinical notes. At the patient level, our method achieved a precision of 0.90, a recall of 0.84, and an F1-score of 0.87. Additionally, this NLP approach identified over three times as many dementia cases as structured data methods. All processing was completed in approximately two weeks using a single machine with dual A40 GPUs. This study demonstrates the feasibility of hybrid NLP solutions for large-scale clinical text analysis, making state-of-the-art methods more accessible to healthcare organizations with limited computational resources.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python
Authors:
Hannah Eyre,
Alec B Chapman,
Kelly S Peterson,
Jianlin Shi,
Patrick R Alba,
Makoto M Jones,
Tamara L Box,
Scott L DuVall,
Olga V Patterson
Abstract:
Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety o…
▽ More
Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.