Multi-lingual and Multi-cultural Figurative Language Understanding

Kabra, Anubha; Liu, Emmy; Khanuja, Simran; Aji, Alham Fikri; Winata, Genta Indra; Cahyawijaya, Samuel; Aremu, Anuoluwapo; Ogayo, Perez; Neubig, Graham

Computer Science > Computation and Language

arXiv:2305.16171 (cs)

[Submitted on 25 May 2023]

Title:Multi-lingual and Multi-cultural Figurative Language Understanding

Authors:Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig

View PDF

Abstract:Figurative language permeates human communication, but at the same time is relatively understudied in NLP. Datasets have been created in English to accelerate progress towards measuring and improving figurative language processing in language models (LMs). However, the use of figurative language is an expression of our cultural and societal experiences, making it difficult for these phrases to be universally applicable. In this work, we create a figurative language inference dataset, \datasetname, for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. We assess multilingual LMs' abilities to interpret figurative language in zero-shot and few-shot settings. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data, emphasizing the need for LMs to be exposed to a broader range of linguistic and cultural variation during training.

Comments:	ACL 2023 Findings
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.16171 [cs.CL]
	(or arXiv:2305.16171v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.16171

Submission history

From: Emmy Liu [view email]
[v1] Thu, 25 May 2023 15:30:31 UTC (893 KB)

Computer Science > Computation and Language

Title:Multi-lingual and Multi-cultural Figurative Language Understanding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Multi-lingual and Multi-cultural Figurative Language Understanding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators