Overlapping Word Removal is All You Need: Revisiting Data Imbalance in Hope Speech Detection

LekshmiAmmal, Hariharan RamakrishnaIyer; Ravikiran, Manikandan; Nisha, Gayathri; Balamuralidhar, Navyasree; Madhusoodanan, Adithya; Madasamy, Anand Kumar; Chakravarthi, Bharathi Raja

doi:10.1080/0952813X.2023.2166130

Computer Science > Computation and Language

arXiv:2204.05488 (cs)

[Submitted on 12 Apr 2022]

Title:Overlapping Word Removal is All You Need: Revisiting Data Imbalance in Hope Speech Detection

Authors:Hariharan RamakrishnaIyer LekshmiAmmal, Manikandan Ravikiran, Gayathri Nisha, Navyasree Balamuralidhar, Adithya Madhusoodanan, Anand Kumar Madasamy, Bharathi Raja Chakravarthi

View PDF

Abstract:Hope Speech Detection, a task of recognizing positive expressions, has made significant strides recently. However, much of the current works focus on model development without considering the issue of inherent imbalance in the data. Our work revisits this issue in hope-speech detection by introducing focal loss, data augmentation, and pre-processing strategies. Accordingly, we find that introducing focal loss as part of Multilingual-BERT's (M-BERT) training process mitigates the effect of class imbalance and improves overall F1-Macro by 0.11. At the same time, contextual and back-translation-based word augmentation with M-BERT improves results by 0.10 over baseline despite imbalance. Finally, we show that overlapping word removal based on pre-processing, though simple, improves F1-Macro by 0.28. In due process, we present detailed studies depicting various behaviors of each of these strategies and summarize key findings from our empirical results for those interested in getting the most out of M-BERT for hope speech detection under real-world conditions of data imbalance.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2204.05488 [cs.CL]
	(or arXiv:2204.05488v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.05488
Related DOI:	https://doi.org/10.1080/0952813X.2023.2166130

Submission history

From: Hariharan R L [view email]
[v1] Tue, 12 Apr 2022 02:38:54 UTC (162 KB)

Computer Science > Computation and Language

Title:Overlapping Word Removal is All You Need: Revisiting Data Imbalance in Hope Speech Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Overlapping Word Removal is All You Need: Revisiting Data Imbalance in Hope Speech Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators