TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

Patel, Yash; Gomez, Lluis; Gomez, Raul; Rusiñol, Marçal; Karatzas, Dimosthenis; Jawahar, C. V.

Computer Science > Computer Vision and Pattern Recognition

arXiv:1807.02110 (cs)

[Submitted on 4 Jul 2018]

Title:TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

Authors:Yash Patel, Lluis Gomez, Raul Gomez, Marçal Rusiñol, Dimosthenis Karatzas, C.V. Jawahar

View PDF

Abstract:The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such datasets requires a tremendous amount of human effort and annotations are limited to popular set of classes. As an alternative, learning visual features by designing auxiliary tasks which make use of freely available self-supervision has become increasingly popular in the computer vision community.
In this paper, we put forward an idea to take advantage of multi-modal context to provide self-supervision for the training of computer vision algorithms. We show that adequate visual features can be learned efficiently by training a CNN to predict the semantic textual context in which a particular image is more probable to appear as an illustration. More specifically we use popular text embedding techniques to provide the self-supervision for the training of deep CNN.
Our experiments demonstrate state-of-the-art performance in image classification, object detection, and multi-modal retrieval compared to recent self-supervised or naturally-supervised approaches.

Comments:	arXiv admin note: text overlap with arXiv:1705.08631
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1807.02110 [cs.CV]
	(or arXiv:1807.02110v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1807.02110

Submission history

From: Yash Patel [view email]
[v1] Wed, 4 Jul 2018 21:44:09 UTC (10,669 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators