DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Johnson, Justin; Karpathy, Andrej; Fei-Fei, Li

Computer Science > Computer Vision and Pattern Recognition

arXiv:1511.07571 (cs)

[Submitted on 24 Nov 2015]

Title:DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Authors:Justin Johnson, Andrej Karpathy, Li Fei-Fei

View PDF

Abstract:We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a Fully Convolutional Localization Network (FCLN) architecture that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with a single round of optimization. The architecture is composed of a Convolutional Network, a novel dense localization layer, and Recurrent Neural Network language model that generates the label sequences. We evaluate our network on the Visual Genome dataset, which comprises 94,000 images and 4,100,000 region-grounded captions. We observe both speed and accuracy improvements over baselines based on current state of the art approaches in both generation and retrieval settings.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1511.07571 [cs.CV]
	(or arXiv:1511.07571v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1511.07571

Submission history

From: Justin Johnson [view email]
[v1] Tue, 24 Nov 2015 05:13:54 UTC (8,864 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2015-11

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Justin Johnson
Andrej Karpathy
Fei-Fei Li
Li Fei-Fei

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators