Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN

Lin, Emily; Sun, Jian; Chen, Hsingyu; Mahoor, Mohammad H.

doi:10.1109/EMBC53108.2024.10782647

Computer Science > Computation and Language

arXiv:2402.02262 (cs)

[Submitted on 3 Feb 2024 (v1), last revised 20 Dec 2024 (this version, v2)]

Title:Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN

Authors:Emily Lin, Jian Sun, Hsingyu Chen, Mohammad H. Mahoor

View PDF HTML (experimental)

Abstract:Suicide remains a pressing global health concern, necessitating innovative approaches for early detection and intervention. This paper focuses on identifying suicidal intentions in posts from the SuicideWatch subreddit by proposing a novel deep-learning approach that utilizes the state-of-the-art RoBERTa-CNN model. The robustly Optimized BERT Pretraining Approach (RoBERTa) excels at capturing textual nuances and forming semantic relationships within the text. The remaining Convolutional Neural Network (CNN) head enhances RoBERTa's capacity to discern critical patterns from extensive datasets. To evaluate RoBERTa-CNN, we conducted experiments on the Suicide and Depression Detection dataset, yielding promising results. For instance, RoBERTa-CNN achieves a mean accuracy of 98% with a standard deviation (STD) of 0.0009. Additionally, we found that data quality significantly impacts the training of a robust model. To improve data quality, we removed noise from the text data while preserving its contextual content through either manually cleaning or utilizing the OpenAI API.

Comments:	4 pages, 1 figure, 4 tables
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.02262 [cs.CL]
	(or arXiv:2402.02262v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.02262
Journal reference:	2024 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 2024, pp. 1-5
Related DOI:	https://doi.org/10.1109/EMBC53108.2024.10782647

Submission history

From: Jian Sun [view email]
[v1] Sat, 3 Feb 2024 20:58:09 UTC (265 KB)
[v2] Fri, 20 Dec 2024 18:21:16 UTC (256 KB)

Computer Science > Computation and Language

Title:Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Data Quality Matters: Suicide Intention Detection on Social Media Posts Using RoBERTa-CNN

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators