Video Summarization: Towards Entity-Aware Captions

Ayyubi, Hammad A.; Liu, Tianqi; Nagrani, Arsha; Lin, Xudong; Zhang, Mingda; Arnab, Anurag; Han, Feng; Zhu, Yukun; Liu, Jialu; Chang, Shih-Fu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2312.02188 (cs)

[Submitted on 1 Dec 2023 (v1), last revised 11 Nov 2024 (this version, v2)]

Title:Video Summarization: Towards Entity-Aware Captions

Authors:Hammad A. Ayyubi, Tianqi Liu, Arsha Nagrani, Xudong Lin, Mingda Zhang, Anurag Arnab, Feng Han, Yukun Zhu, Jialu Liu, Shih-Fu Chang

View PDF HTML (experimental)

Abstract:Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities for meaningful summarization. As such, we propose the task of summarizing news video directly to entity-aware captions. We also release a large-scale dataset, VIEWS (VIdeo NEWS), to support research on this task. Further, we propose a method that augments visual information from videos with context retrieved from external world knowledge to generate entity-aware captions. We demonstrate the effectiveness of our approach on three video captioning models. We also show that our approach generalizes to existing news image captions dataset. With all the extensive experiments and insights, we believe we establish a solid basis for future research on this challenging task.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Cite as:	arXiv:2312.02188 [cs.CV]
	(or arXiv:2312.02188v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2312.02188

Submission history

From: Hammad Ayyubi [view email]
[v1] Fri, 1 Dec 2023 23:56:00 UTC (7,570 KB)
[v2] Mon, 11 Nov 2024 05:14:15 UTC (13,791 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Video Summarization: Towards Entity-Aware Captions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Video Summarization: Towards Entity-Aware Captions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators