ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

Rehman, Mohammad Zia Ur; Bhatnagar, Anukriti; Kabde, Omkar; Bansal, Shubhi; Kumar, Nagendra

doi:10.18653/v1/2025.acl-long.842

Computer Science > Computer Vision and Pattern Recognition

arXiv:2508.06570 (cs)

[Submitted on 7 Aug 2025 (v1), last revised 15 Aug 2025 (this version, v2)]

Title:ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

Authors:Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Nagendra Kumar

View PDF HTML (experimental)

Abstract:The existing research has primarily focused on text and image-based hate speech detection, video-based approaches remain underexplored. In this work, we introduce a novel dataset, ImpliHateVid, specifically curated for implicit hate speech detection in videos. ImpliHateVid consists of 2,009 videos comprising 509 implicit hate videos, 500 explicit hate videos, and 1,000 non-hate videos, making it one of the first large-scale video datasets dedicated to implicit hate detection. We also propose a novel two-stage contrastive learning framework for hate speech detection in videos. In the first stage, we train modality-specific encoders for audio, text, and image using contrastive loss by concatenating features from the three encoders. In the second stage, we train cross-encoders using contrastive learning to refine multimodal representations. Additionally, we incorporate sentiment, emotion, and caption-based features to enhance implicit hate detection. We evaluate our method on two datasets, ImpliHateVid for implicit hate speech detection and another dataset for general hate speech detection in videos, HateMM dataset, demonstrating the effectiveness of the proposed multimodal contrastive learning for hateful content detection in videos and the significance of our dataset.

Comments:	Published in ACL 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2508.06570 [cs.CV]
	(or arXiv:2508.06570v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2508.06570
Related DOI:	https://doi.org/10.18653/v1/2025.acl-long.842

Submission history

From: Mohammad Zia Ur Rehman [view email]
[v1] Thu, 7 Aug 2025 05:57:22 UTC (1,569 KB)
[v2] Fri, 15 Aug 2025 14:09:37 UTC (1,574 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators