-
Investigating Algorithmic Bias in YouTube Shorts
Authors:
Mert Can Cakmak,
Nitin Agarwal,
Diwash Poudel
Abstract:
The rapid growth of YouTube Shorts, now serving over 2 billion monthly users, reflects a global shift toward short-form video as a dominant mode of online content consumption. This study investigates algorithmic bias in YouTube Shorts' recommendation system by analyzing how watch-time duration, topic sensitivity, and engagement metrics influence content visibility and drift. We focus on three cont…
▽ More
The rapid growth of YouTube Shorts, now serving over 2 billion monthly users, reflects a global shift toward short-form video as a dominant mode of online content consumption. This study investigates algorithmic bias in YouTube Shorts' recommendation system by analyzing how watch-time duration, topic sensitivity, and engagement metrics influence content visibility and drift. We focus on three content domains: the South China Sea dispute, the 2024 Taiwan presidential election, and general YouTube Shorts content. Using generative AI models, we classified 685,842 videos across relevance, topic category, and emotional tone. Our results reveal a consistent drift away from politically sensitive content toward entertainment-focused videos. Emotion analysis shows a systematic preference for joyful or neutral content, while engagement patterns indicate that highly viewed and liked videos are disproportionately promoted, reinforcing popularity bias. This work provides the first comprehensive analysis of algorithmic drift in YouTube Shorts based on textual content, emotional tone, topic categorization, and varying watch-time conditions. These findings offer new insights into how algorithmic design shapes content exposure, with implications for platform transparency and information diversity.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Simulating User Watch-Time to Investigate Bias in YouTube Shorts Recommendations
Authors:
Selimhan Dagtas,
Mert Can Cakmak,
Nitin Agarwal
Abstract:
Short-form video platforms such as YouTube Shorts increasingly shape how information is consumed, yet the effects of engagement-driven algorithms on content exposure remain poorly understood. This study investigates how different viewing behaviors, including fast scrolling or skipping, influence the relevance and topical continuity of recommended videos. Using a dataset of over 404,000 videos, we…
▽ More
Short-form video platforms such as YouTube Shorts increasingly shape how information is consumed, yet the effects of engagement-driven algorithms on content exposure remain poorly understood. This study investigates how different viewing behaviors, including fast scrolling or skipping, influence the relevance and topical continuity of recommended videos. Using a dataset of over 404,000 videos, we simulate viewer interactions across both broader geopolitical themes and more narrowly focused conflicts, including topics related to Russia, China, the Russia-Ukraine War, and the South China Sea dispute. We assess how relevance shifts across recommendation chains under varying watch-time conditions, using GPT-4o to evaluate semantic alignment between videos. Our analysis reveals patterns of amplification, drift, and topic generalization, with significant implications for content diversity and platform accountability. By bridging perspectives from computer science, media studies, and political communication, this work contributes a multidisciplinary understanding of how engagement cues influence algorithmic pathways in short-form content ecosystems.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
PRISM: Perceptual Recognition for Identifying Standout Moments in Human-Centric Keyframe Extraction
Authors:
Mert Can Cakmak,
Nitin Agarwal,
Diwash Poudel
Abstract:
Online videos play a central role in shaping political discourse and amplifying cyber social threats such as misinformation, propaganda, and radicalization. Detecting the most impactful or "standout" moments in video content is crucial for content moderation, summarization, and forensic analysis. In this paper, we introduce PRISM (Perceptual Recognition for Identifying Standout Moments), a lightwe…
▽ More
Online videos play a central role in shaping political discourse and amplifying cyber social threats such as misinformation, propaganda, and radicalization. Detecting the most impactful or "standout" moments in video content is crucial for content moderation, summarization, and forensic analysis. In this paper, we introduce PRISM (Perceptual Recognition for Identifying Standout Moments), a lightweight and perceptually-aligned framework for keyframe extraction. PRISM operates in the CIELAB color space and uses perceptual color difference metrics to identify frames that align with human visual sensitivity. Unlike deep learning-based approaches, PRISM is interpretable, training-free, and computationally efficient, making it well suited for real-time and resource-constrained environments. We evaluate PRISM on four benchmark datasets: BBC, TVSum, SumMe, and ClipShots, and demonstrate that it achieves strong accuracy and fidelity while maintaining high compression ratios. These results highlight PRISM's effectiveness in both structured and unstructured video content, and its potential as a scalable tool for analyzing and moderating harmful or politically sensitive media in online platforms.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
TriPSS: A Tri-Modal Keyframe Extraction Framework Using Perceptual, Structural, and Semantic Representations
Authors:
Mert Can Cakmak,
Nitin Agarwal,
Diwash Poudel
Abstract:
Efficient keyframe extraction is critical for effective video summarization and retrieval, yet capturing the complete richness of video content remains challenging. In this work, we present TriPSS, a novel tri-modal framework that effectively integrates perceptual cues from color features in the CIELAB space, deep structural embeddings derived from ResNet-50, and semantic context from frame-level…
▽ More
Efficient keyframe extraction is critical for effective video summarization and retrieval, yet capturing the complete richness of video content remains challenging. In this work, we present TriPSS, a novel tri-modal framework that effectively integrates perceptual cues from color features in the CIELAB space, deep structural embeddings derived from ResNet-50, and semantic context from frame-level captions generated by Llama-3.2-11B-Vision-Instruct. By fusing these diverse modalities using principal component analysis, TriPSS constructs robust multi-modal embeddings that enable adaptive segmentation of video content via HDBSCAN clustering. A subsequent refinement stage incorporating quality assessment and duplicate filtering ensures that the final keyframe set is both concise and semantically rich. Comprehensive evaluations on benchmark datasets TVSum20 and SumMe demonstrate that TriPSS achieves state-of-the-art performance, substantially outperforming traditional unimodal and previous multi-modal methods. These results underscore TriPSS's ability to capture nuanced visual and semantic information, thereby setting a new benchmark for video content understanding in large-scale retrieval scenarios.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.