Code-mixed Sentiment and Hate-speech Prediction

Yadav, Anjali; Garg, Tanya; Klemen, Matej; Ulcar, Matej; Agarwal, Basant; Sikonja, Marko Robnik

doi:10.1109/TAFFC.2025.3553399

Computer Science > Computation and Language

arXiv:2405.12929 (cs)

[Submitted on 21 May 2024]

Title:Code-mixed Sentiment and Hate-speech Prediction

Authors:Anjali Yadav, Tanya Garg, Matej Klemen, Matej Ulcar, Basant Agarwal, Marko Robnik Sikonja

View PDF HTML (experimental)

Abstract:Code-mixed discourse combines multiple languages in a single text. It is commonly used in informal discourse in countries with several official languages, but also in many other countries in combination with English or neighboring languages. As recently large language models have dominated most natural language processing tasks, we investigated their performance in code-mixed settings for relevant tasks. We first created four new bilingual pre-trained masked language models for English-Hindi and English-Slovene languages, specifically aimed to support informal language. Then we performed an evaluation of monolingual, bilingual, few-lingual, and massively multilingual models on several languages, using two tasks that frequently contain code-mixed text, in particular, sentiment analysis and offensive language detection in social media texts. The results show that the most successful classifiers are fine-tuned bilingual models and multilingual models, specialized for social media texts, followed by non-specialized massively multilingual and monolingual models, while huge generative models are not competitive. For our affective problems, the models mostly perform slightly better on code-mixed data compared to non-code-mixed data.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2405.12929 [cs.CL]
	(or arXiv:2405.12929v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.12929
Related DOI:	https://doi.org/10.1109/TAFFC.2025.3553399

Submission history

From: Anjali Yadav [view email]
[v1] Tue, 21 May 2024 16:56:36 UTC (216 KB)

Computer Science > Computation and Language

Title:Code-mixed Sentiment and Hate-speech Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Code-mixed Sentiment and Hate-speech Prediction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators