Adaptive Feature Abstraction for Translating Video to Language

Pu, Yunchen; Min, Martin Renqiang; Gan, Zhe; Carin, Lawrence

Computer Science > Computer Vision and Pattern Recognition

arXiv:1611.07837v1 (cs)

[Submitted on 23 Nov 2016 (this version), latest version 17 Nov 2017 (v3)]

Title:Adaptive Feature Abstraction for Translating Video to Language

Authors:Yunchen Pu, Martin Renqiang Min, Zhe Gan, Lawrence Carin

View PDF

Abstract:A new model for video captioning is developed, using a deep three-dimensional Convolutional Neural Network (C3D) as an encoder for videos and a Recurrent Neural Network (RNN) as a decoder for captions. We consider both "hard" and "soft" attention mechanisms, to adaptively and sequentially focus on different layers of features (levels of feature "abstraction"), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Cite as:	arXiv:1611.07837 [cs.CV]
	(or arXiv:1611.07837v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1611.07837

Submission history

From: Yunchen Pu [view email]
[v1] Wed, 23 Nov 2016 15:21:48 UTC (4,421 KB)
[v2] Wed, 15 Nov 2017 03:40:47 UTC (4,223 KB)
[v3] Fri, 17 Nov 2017 05:13:16 UTC (4,216 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2016-11

Change to browse by:

cs
cs.CL

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yunchen Pu
Martin Renqiang Min
Zhe Gan
Lawrence Carin

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Feature Abstraction for Translating Video to Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Adaptive Feature Abstraction for Translating Video to Language

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators