A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Khan, Rashid; Huang, Bingding; Hassan, Haseeb; Zaman, Asim; Ye, Zhongfu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.07252 (cs)

[Submitted on 11 Oct 2023]

Title:A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Authors:Rashid Khan, Bingding Huang, Haseeb Hassan, Asim Zaman, Zhongfu Ye

View PDF

Abstract:Image captioning is a challenging task involving generating a textual description for an image using computer vision and natural language processing techniques. This paper proposes a deep neural framework for image caption generation using a GRU-based attention mechanism. Our approach employs multiple pre-trained convolutional neural networks as the encoder to extract features from the image and a GRU-based language model as the decoder to generate descriptive sentences. To improve performance, we integrate the Bahdanau attention model with the GRU decoder to enable learning to focus on specific image parts. We evaluate our approach using the MSCOCO and Flickr30k datasets and show that it achieves competitive scores compared to state-of-the-art methods. Our proposed framework can bridge the gap between computer vision and natural language and can be extended to specific domains.

Comments:	15pages, 10 figures, 5 tables. 2023 the 5th International Conference on Robotics and Computer Vision (ICRCV 2023). arXiv admin note: substantial text overlap with arXiv:2203.01594
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2310.07252 [cs.CV]
	(or arXiv:2310.07252v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.07252

Submission history

From: Rashid Khan Dr [view email]
[v1] Wed, 11 Oct 2023 07:30:01 UTC (1,536 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators