Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

Sudhakaran, Swathikiran; Lanz, Oswald

Computer Science > Computer Vision and Pattern Recognition

arXiv:1807.11794 (cs)

[Submitted on 31 Jul 2018]

Title:Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

Authors:Swathikiran Sudhakaran, Oswald Lanz

View PDF

Abstract:In this paper we propose an end-to-end trainable deep neural network model for egocentric activity recognition. Our model is built on the observation that egocentric activities are highly characterized by the objects and their locations in the video. Based on this, we develop a spatial attention mechanism that enables the network to attend to regions containing objects that are correlated with the activity under consideration. We learn highly specialized attention maps for each frame using class-specific activations from a CNN pre-trained for generic image recognition, and use them for spatio-temporal encoding of the video with a convolutional LSTM. Our model is trained in a weakly supervised setting using raw video-level activity-class labels. Nonetheless, on standard egocentric activity benchmarks our model surpasses by up to +6% points recognition accuracy the currently best performing method that leverages hand segmentation and object location strong supervision for training. We visually analyze attention maps generated by the network, revealing that the network successfully identifies the relevant objects present in the video frames which may explain the strong recognition performance. We also discuss an extensive ablation analysis regarding the design choices.

Comments:	Accepted to BMVC 2018
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1807.11794 [cs.CV]
	(or arXiv:1807.11794v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1807.11794

Submission history

From: Swathikiran Sudhakaran [view email]
[v1] Tue, 31 Jul 2018 12:54:06 UTC (6,912 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators