Improved Image Captioning with Adversarial Semantic Alignment

Melnyk, Igor; Sercu, Tom; Dognin, Pierre L.; Ross, Jarret; Mroueh, Youssef

Computer Science > Machine Learning

arXiv:1805.00063v1 (cs)

[Submitted on 30 Apr 2018 (this version), latest version 6 Jun 2019 (v3)]

Title:Improved Image Captioning with Adversarial Semantic Alignment

Authors:Igor Melnyk, Tom Sercu, Pierre L. Dognin, Jarret Ross, Youssef Mroueh (IBM Research, USA)

View PDF

Abstract:In this paper we propose a new conditional GAN for image captioning that enforces semantic alignment between images and captions through a co-attentive discriminator and a context-aware LSTM sequence generator. In order to train these sequence GANs, we empirically study two algorithms: Self-critical Sequence Training (SCST) and Gumbel Straight-Through. Both techniques are confirmed to be viable for training sequence GANs. However, SCST displays better gradient behavior despite not directly leveraging gradients from the discriminator. This ensures a stronger stability of sequence GANs training and ultimately produces models with improved results under human evaluation. Automatic evaluation of GAN trained captioning models is an open question. To remedy this, we introduce a new semantic score with strong correlation to human judgement. As a paradigm for evaluation, we suggest that the generalization ability of the captioner to Out of Context (OOC) scenes is an important criterion to assess generalization and composition. To this end, we propose an OOC dataset which, combined with our automatic metric of semantic score, is a new benchmark for the captioning community to measure the generalization ability of automatic image captioning. Under this new OOC benchmark, and on the traditional MSCOCO dataset, our models trained with SCST have strong performance in both semantic score and human evaluation.

Comments:	Authors Equal Contribution
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1805.00063 [cs.LG]
	(or arXiv:1805.00063v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.00063

Submission history

From: Pierre Dognin [view email]
[v1] Mon, 30 Apr 2018 19:10:43 UTC (3,320 KB)
[v2] Fri, 1 Jun 2018 17:43:25 UTC (2,234 KB)
[v3] Thu, 6 Jun 2019 18:41:03 UTC (6,516 KB)

Computer Science > Machine Learning

Title:Improved Image Captioning with Adversarial Semantic Alignment

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Improved Image Captioning with Adversarial Semantic Alignment

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators