Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Chakravarthi, Bharathi Raja; Priyadharshini, Ruba; Ponnusamy, Rahul; Kumaresan, Prasanna Kumar; Sampath, Kayalvizhi; Thenmozhi, Durairaj; Thangasamy, Sathiyaraj; Nallathambi, Rajendran; McCrae, John Phillip

Computer Science > Computation and Language

arXiv:2109.00227 (cs)

[Submitted on 1 Sep 2021]

Title:Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Authors:Bharathi Raja Chakravarthi, Ruba Priyadharshini, Rahul Ponnusamy, Prasanna Kumar Kumaresan, Kayalvizhi Sampath, Durairaj Thenmozhi, Sathiyaraj Thangasamy, Rajendran Nallathambi, John Phillip McCrae

View PDF

Abstract:The increased proliferation of abusive content on social media platforms has a negative impact on online users. The dread, dislike, discomfort, or mistrust of lesbian, gay, transgender or bisexual persons is defined as homophobia/transphobia. Homophobic/transphobic speech is a type of offensive language that may be summarized as hate speech directed toward LGBT+ people, and it has been a growing concern in recent years. Online homophobia/transphobia is a severe societal problem that can make online platforms poisonous and unwelcome to LGBT+ people while also attempting to eliminate equality, diversity, and inclusion. We provide a new hierarchical taxonomy for online homophobia and transphobia, as well as an expert-labelled dataset that will allow homophobic/transphobic content to be automatically identified. We educated annotators and supplied them with comprehensive annotation rules because this is a sensitive issue, and we previously discovered that untrained crowdsourcing annotators struggle with diagnosing homophobia due to cultural and other prejudices. The dataset comprises 15,141 annotated multilingual comments. This paper describes the process of building the dataset, qualitative analysis of data, and inter-annotator agreement. In addition, we create baseline models for the dataset. To the best of our knowledge, our dataset is the first such dataset created. Warning: This paper contains explicit statements of homophobia, transphobia, stereotypes which may be distressing to some readers.

Comments:	44 Pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2109.00227 [cs.CL]
	(or arXiv:2109.00227v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.00227

Submission history

From: Bharathi Raja Chakravarthi [view email]
[v1] Wed, 1 Sep 2021 08:05:57 UTC (1,387 KB)

Computer Science > Computation and Language

Title:Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Dataset for Identification of Homophobia and Transophobia in Multilingual YouTube Comments

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators