Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Mogadala, Aditya; Kalimuthu, Marimuthu; Klakow, Dietrich

doi:10.1613/jair.1.11688

Computer Science > Computer Vision and Pattern Recognition

arXiv:1907.09358 (cs)

[Submitted on 22 Jul 2019 (v1), last revised 31 Dec 2021 (this version, v3)]

Title:Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Authors:Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow

View PDF

Abstract:Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges and build new applications.

Comments:	Published at the Journal of Artificial Intelligence Research (JAIR); 135 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1907.09358 [cs.CV]
	(or arXiv:1907.09358v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1907.09358
Journal reference:	Journal of Artificial Intelligence Research, Vol. 71, 2021
Related DOI:	https://doi.org/10.1613/jair.1.11688

Submission history

From: Aditya Mogadala [view email]
[v1] Mon, 22 Jul 2019 14:53:48 UTC (3,585 KB)
[v2] Sat, 12 Sep 2020 13:26:29 UTC (3,695 KB)
[v3] Fri, 31 Dec 2021 20:40:20 UTC (3,782 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators