Abusive Span Detection for Vietnamese Narrative Texts

Nguyen, Nhu-Thanh; Phan, Khoa Thi-Kim; Nguyen, Duc-Vu; Nguyen, Ngan Luu-Thuy

doi:10.1145/3628797.3628921

Computer Science > Computation and Language

arXiv:2312.07831 (cs)

[Submitted on 13 Dec 2023]

Title:Abusive Span Detection for Vietnamese Narrative Texts

Authors:Nhu-Thanh Nguyen, Khoa Thi-Kim Phan, Duc-Vu Nguyen, Ngan Luu-Thuy Nguyen

View PDF HTML (experimental)

Abstract:Abuse in its various forms, including physical, psychological, verbal, sexual, financial, and cultural, has a negative impact on mental health. However, there are limited studies on applying natural language processing (NLP) in this field in Vietnam. Therefore, we aim to contribute by building a human-annotated Vietnamese dataset for detecting abusive content in Vietnamese narrative texts. We sourced these texts from VnExpress, Vietnam's popular online newspaper, where readers often share stories containing abusive content. Identifying and categorizing abusive spans in these texts posed significant challenges during dataset creation, but it also motivated our research. We experimented with lightweight baseline models by freezing PhoBERT and XLM-RoBERTa and using their hidden states in a BiLSTM to assess the complexity of the dataset. According to our experimental results, PhoBERT outperforms other models in both labeled and unlabeled abusive span detection tasks. These results indicate that it has the potential for future improvements.

Comments:	Accepted at SoICT 2023
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2312.07831 [cs.CL]
	(or arXiv:2312.07831v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.07831
Related DOI:	https://doi.org/10.1145/3628797.3628921

Submission history

From: Duc-Vu Nguyen [view email]
[v1] Wed, 13 Dec 2023 01:36:18 UTC (1,277 KB)

Computer Science > Computation and Language

Title:Abusive Span Detection for Vietnamese Narrative Texts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Abusive Span Detection for Vietnamese Narrative Texts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators