Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models

Li, Zehan; Hu, Yan; Lane, Scott; Selek, Salih; Shahani, Lokesh; Machado-Vieira, Rodrigo; Soares, Jair; Xu, Hua; Liu, Hongfang; Huang, Ming

Computer Science > Computation and Language

arXiv:2409.18878 (cs)

[Submitted on 27 Sep 2024 (v1), last revised 3 Oct 2024 (this version, v2)]

Title:Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models

Authors:Zehan Li, Yan Hu, Scott Lane, Salih Selek, Lokesh Shahani, Rodrigo Machado-Vieira, Jair Soares, Hua Xu, Hongfang Liu, Ming Huang

View PDF

Abstract:Accurate identification and categorization of suicidal events can yield better suicide precautions, reducing operational burden, and improving care quality in high-acuity psychiatric settings. Pre-trained language models offer promise for identifying suicidality from unstructured clinical narratives. We evaluated the performance of four BERT-based models using two fine-tuning strategies (multiple single-label and single multi-label) for detecting coexisting suicidal events from 500 annotated psychiatric evaluation notes. The notes were labeled for suicidal ideation (SI), suicide attempts (SA), exposure to suicide (ES), and non-suicidal self-injury (NSSI). RoBERTa outperformed other models using multiple single-label classification strategy (acc=0.86, F1=0.78). MentalBERT (acc=0.83, F1=0.74) also exceeded BioClinicalBERT (acc=0.82, F1=0.72) which outperformed BERT (acc=0.80, F1=0.70). RoBERTa fine-tuned with single multi-label classification further improved the model performance (acc=0.88, F1=0.81). The findings highlight that the model optimization, pretraining with domain-relevant data, and the single multi-label classification strategy enhance the model performance of suicide phenotyping. Keywords: EHR-based Phenotyping; Natural Language Processing; Secondary Use of EHR Data; Suicide Classification; BERT-based Model; Psychiatry; Mental Health

Comments:	submitted to AMIA Informatics Summit 2025 as a conference paper
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Information Retrieval (cs.IR)
Cite as:	arXiv:2409.18878 [cs.CL]
	(or arXiv:2409.18878v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2409.18878

Submission history

From: Zehan Li [view email]
[v1] Fri, 27 Sep 2024 16:13:38 UTC (706 KB)
[v2] Thu, 3 Oct 2024 20:49:55 UTC (706 KB)

Computer Science > Computation and Language

Title:Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Suicide Phenotyping from Clinical Notes in Safety-Net Psychiatric Hospital Using Multi-Label Classification with Pre-Trained Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators