Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Zhao, Shichao; Liu, Yanbin; Han, Yahong; Hong, Richang

Computer Science > Computer Vision and Pattern Recognition

arXiv:1511.02126 (cs)

[Submitted on 6 Nov 2015]

Title:Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Authors:Shichao Zhao, Yanbin Liu, Yahong Han, Richang Hong

View PDF

Abstract:Deep ConvNets have shown its good performance in image classification tasks. However it still remains as a problem in deep video representation for action recognition. The problem comes from two aspects: on one hand, current video ConvNets are relatively shallow compared with image ConvNets, which limits its capability of capturing the complex video action information; on the other hand, temporal information of videos is not properly utilized to pool and encode the video sequences. Towards these issues, in this paper, we utilize two state-of-the-art ConvNets, i.e., the very deep spatial net (VGGNet) and the temporal net from Two-Stream ConvNets, for action representation. The convolutional layers and the proposed new layer, called frame-diff layer, are extracted and pooled with two temporal pooling strategy: Trajectory pooling and line pooling. The pooled local descriptors are then encoded with VLAD to form the video representations. In order to verify the effectiveness of the proposed framework, we conduct experiments on UCF101 and HMDB51 datasets. It achieves the accuracy of 93.78\% on UCF101 which is the state-of-the-art and the accuracy of 65.62\% on HMDB51 which is comparable to the state-of-the-art.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1511.02126 [cs.CV]
	(or arXiv:1511.02126v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1511.02126

Submission history

From: Yahong Han [view email]
[v1] Fri, 6 Nov 2015 15:51:07 UTC (1,180 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2015-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Shichao Zhao
Yanbin Liu
Yahong Han
Richang Hong

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators