Comparative evaluation of CNN architectures for Image Caption Generation

Katiyar, Sulabh; Borgohain, Samir Kumar

doi:10.14569/IJACSA.2020.0111291

Computer Science > Computer Vision and Pattern Recognition

arXiv:2102.11506 (cs)

[Submitted on 23 Feb 2021]

Title:Comparative evaluation of CNN architectures for Image Caption Generation

Authors:Sulabh Katiyar, Samir Kumar Borgohain

View PDF

Abstract:Aided by recent advances in Deep Learning, Image Caption Generation has seen tremendous progress over the last few years. Most methods use transfer learning to extract visual information, in the form of image features, with the help of pre-trained Convolutional Neural Network models followed by transformation of the visual information using a Caption Generator module to generate the output sentences. Different methods have used different Convolutional Neural Network Architectures and, to the best of our knowledge, there is no systematic study which compares the relative efficacy of different Convolutional Neural Network architectures for extracting the visual information. In this work, we have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks: the first based on Neural Image Caption (NIC) generation model and the second based on Soft-Attention framework. We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.

Comments:	Article Published in International Journal of Advanced Computer Science and Applications(IJACSA), Volume 11 Issue 12, 2020
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
Cite as:	arXiv:2102.11506 [cs.CV]
	(or arXiv:2102.11506v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2102.11506
Journal reference:	in International Journal of Advanced Computer Science and Applications, 11(12), 2020
Related DOI:	https://doi.org/10.14569/IJACSA.2020.0111291

Submission history

From: Sulabh Katiyar [view email]
[v1] Tue, 23 Feb 2021 05:43:54 UTC (1,487 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Comparative evaluation of CNN architectures for Image Caption Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Comparative evaluation of CNN architectures for Image Caption Generation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators