Semi-self-supervised Automated ICD Coding

Hlynsson, Hlynur D.; Ellertsson, Steindór; Daðason, Jón F.; Sigurdsson, Emil L.; Loftsson, Hrafn

Computer Science > Computation and Language

arXiv:2205.10088 (cs)

[Submitted on 20 May 2022 (v1), last revised 18 Aug 2022 (this version, v2)]

Title:Semi-self-supervised Automated ICD Coding

Authors:Hlynur D. Hlynsson, Steindór Ellertsson, Jón F. Daðason, Emil L. Sigurdsson, Hrafn Loftsson

View PDF

Abstract:Clinical Text Notes (CTNs) contain physicians' reasoning process, written in an unstructured free text format, as they examine and interview patients. In recent years, several studies have been published that provide evidence for the utility of machine learning for predicting doctors' diagnoses from CTNs, a task known as ICD coding. Data annotation is time consuming, particularly when a degree of specialization is needed, as is the case for medical data. This paper presents a method of augmenting a sparsely annotated dataset of Icelandic CTNs with a machine-learned imputation in a semi-self-supervised manner. We train a neural network on a small set of annotated CTNs and use it to extract clinical features from a set of un-annotated CTNs. These clinical features consist of answers to about a thousand potential questions that a physician might find the answers to during a consultation of a patient. The features are then used to train a classifier for the diagnosis of certain types of diseases. We report the results of an evaluation of this data augmentation method over three tiers of data availability to the physician. Our data augmentation method shows a significant positive effect which is diminished when clinical features from the examination of the patient and diagnostics are made available. We recommend our method for augmenting scarce datasets for systems that take decisions based on clinical features that do not include examinations or tests.

Comments:	Re-upload comment: added a baseline comparison as well as an analysis of the features
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2205.10088 [cs.CL]
	(or arXiv:2205.10088v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.10088

Submission history

From: Hlynur Davıð Hlynsson [view email]
[v1] Fri, 20 May 2022 11:12:54 UTC (209 KB)
[v2] Thu, 18 Aug 2022 10:34:15 UTC (765 KB)

Computer Science > Computation and Language

Title:Semi-self-supervised Automated ICD Coding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Semi-self-supervised Automated ICD Coding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators