Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

Chen, Pin-Er; Wang, Po-Ya Angela; Chou, Hsin-Yu; Tseng, Yu-Hsiang; Hsieh, Shu-Kai

Computer Science > Computation and Language

arXiv:2305.14616 (cs)

[Submitted on 24 May 2023 (v1), last revised 24 Oct 2023 (this version, v2)]

Title:Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

Authors:Pin-Er Chen, Po-Ya Angela Wang, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

View PDF

Abstract:This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.

Comments:	10 pages, 9 figures
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.14616 [cs.CL]
	(or arXiv:2305.14616v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.14616

Submission history

From: Pin-Er Chen [view email]
[v1] Wed, 24 May 2023 01:30:50 UTC (1,277 KB)
[v2] Tue, 24 Oct 2023 11:30:07 UTC (1,376 KB)

Computer Science > Computation and Language

Title:Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring Affordance and Situated Meaning in Image Captions: A Multimodal Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators