Learning from Lexical Perturbations for Consistent Visual Question Answering

Whitehead, Spencer; Wu, Hui; Fung, Yi Ren; Ji, Heng; Feris, Rogerio; Saenko, Kate

Computer Science > Computer Vision and Pattern Recognition

arXiv:2011.13406 (cs)

[Submitted on 26 Nov 2020 (v1), last revised 23 Dec 2020 (this version, v2)]

Title:Learning from Lexical Perturbations for Consistent Visual Question Answering

Authors:Spencer Whitehead, Hui Wu, Yi Ren Fung, Heng Ji, Rogerio Feris, Kate Saenko

View PDF

Abstract:Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations. In this paper, we propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations and regularizes the visual reasoning process between them to be consistent during training. We show that our framework markedly improves consistency and generalization ability, demonstrating the value of controlled linguistic perturbations as a useful and currently underutilized training and regularization tool for VQA models. We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations of VQA questions. Our benchmark uniquely draws from large-scale linguistic resources, avoiding human annotation effort while maintaining data quality compared to generative approaches. We benchmark existing VQA models using VQA P2 and provide robustness analysis on each type of linguistic variation.

Comments:	14 pages, 8 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2011.13406 [cs.CV]
	(or arXiv:2011.13406v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2011.13406

Submission history

From: Spencer Whitehead [view email]
[v1] Thu, 26 Nov 2020 17:38:03 UTC (3,967 KB)
[v2] Wed, 23 Dec 2020 00:29:27 UTC (3,968 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2020-11

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Spencer Whitehead
Hui Wu
Yi Ren Fung
Heng Ji
Rogério Schmidt Feris

…

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Learning from Lexical Perturbations for Consistent Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning from Lexical Perturbations for Consistent Visual Question Answering

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators