Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

Yadav, Neemesh; Liu, Jiarui; Ortu, Francesco; Ensafi, Roya; Jin, Zhijing; Mihalcea, Rada

Computer Science > Computation and Language

arXiv:2503.05280 (cs)

[Submitted on 7 Mar 2025 (v1), last revised 10 Mar 2025 (this version, v2)]

Title:Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

Authors:Neemesh Yadav, Jiarui Liu, Francesco Ortu, Roya Ensafi, Zhijing Jin, Rada Mihalcea

Download PDF

Abstract:The ability of Natural Language Processing (NLP) methods to categorize text into multiple classes has motivated their use in online content moderation tasks, such as hate speech and fake news detection. However, there is limited understanding of how or why these methods make such decisions, or why certain content is moderated in the first place. To investigate the hidden mechanisms behind content moderation, we explore multiple directions: 1) training classifiers to reverse-engineer content moderation decisions across countries; 2) explaining content moderation decisions by analyzing Shapley values and LLM-guided explanations. Our primary focus is on content moderation decisions made across countries, using pre-existing corpora sampled from the Twitter Stream Grab. Our experiments reveal interesting patterns in censored posts, both across countries and over time. Through human evaluations of LLM-generated explanations across three LLMs, we assess the effectiveness of using LLMs in content moderation. Finally, we discuss potential future directions, as well as the limitations and ethical considerations of this work. Our code and data are available at this https URL

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2503.05280 [cs.CL]
	(or arXiv:2503.05280v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2503.05280

Submission history

From: Jiarui Liu [view email]
[v1] Fri, 7 Mar 2025 09:49:31 UTC (35,415 KB)
[v2] Mon, 10 Mar 2025 04:41:06 UTC (35,442 KB)

Computer Science > Computation and Language

Title:Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators