Parallel Attention Network with Sequence Matching for Video Grounding

Zhang, Hao; Sun, Aixin; Jing, Wei; Zhen, Liangli; Zhou, Joey Tianyi; Goh, Rick Siow Mong

doi:10.18653/v1/2021.findings-acl.69

Computer Science > Computation and Language

arXiv:2105.08481 (cs)

[Submitted on 18 May 2021]

Title:Parallel Attention Network with Sequence Matching for Video Grounding

Authors:Hao Zhang, Aixin Sun, Wei Jing, Liangli Zhen, Joey Tianyi Zhou, Rick Siow Mong Goh

View PDF

Abstract:Given a video, video grounding aims to retrieve a temporal moment that semantically corresponds to a language query. In this work, we propose a Parallel Attention Network with Sequence matching (SeqPAN) to address the challenges in this task: multi-modal representation learning, and target moment boundary prediction. We design a self-guided parallel attention module to effectively capture self-modal contexts and cross-modal attentive information between video and text. Inspired by sequence labeling tasks in natural language processing, we split the ground truth moment into begin, inside, and end regions. We then propose a sequence matching strategy to guide start/end boundary predictions using region labels. Experimental results on three datasets show that SeqPAN is superior to state-of-the-art methods. Furthermore, the effectiveness of the self-guided parallel attention module and the sequence matching module is verified.

Comments:	15 pages, 10 figures, 7 tables, Findings at ACL 2021
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2105.08481 [cs.CL]
	(or arXiv:2105.08481v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2105.08481
Related DOI:	https://doi.org/10.18653/v1/2021.findings-acl.69

Submission history

From: Hao Zhang [view email]
[v1] Tue, 18 May 2021 12:43:20 UTC (2,363 KB)

Computer Science > Computation and Language

Title:Parallel Attention Network with Sequence Matching for Video Grounding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Parallel Attention Network with Sequence Matching for Video Grounding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators