TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Li, Yanshu; Yun, Tian; Yang, Jianjiang; Feng, Pinyuan; Huang, Jinfa; Tang, Ruixiang

Computer Science > Computation and Language

arXiv:2505.17098 (cs)

[Submitted on 21 May 2025]

Title:TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Authors:Yanshu Li, Tian Yun, Jianjiang Yang, Pinyuan Feng, Jinfa Huang, Ruixiang Tang

View PDF HTML (experimental)

Abstract:Multimodal in-context learning (ICL) has emerged as a key mechanism for harnessing the capabilities of large vision-language models (LVLMs). However, its effectiveness remains highly sensitive to the quality of input in-context sequences, particularly for tasks involving complex reasoning or open-ended generation. A major limitation is our limited understanding of how LVLMs actually exploit these sequences during inference. To bridge this gap, we systematically interpret multimodal ICL through the lens of task mapping, which reveals how local and global relationships within and among demonstrations guide model reasoning. Building on this insight, we present TACO, a lightweight transformer-based model equipped with task-aware attention that dynamically configures in-context sequences. By injecting task-mapping signals into the autoregressive decoding process, TACO creates a bidirectional synergy between sequence construction and task reasoning. Experiments on five LVLMs and nine datasets demonstrate that TACO consistently surpasses baselines across diverse ICL tasks. These results position task mapping as a valuable perspective for interpreting and improving multimodal ICL.

Comments:	29 pages, 11 figures, 19 tables. arXiv admin note: substantial text overlap with arXiv:2503.04839
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2505.17098 [cs.CL]
	(or arXiv:2505.17098v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2505.17098

Submission history

From: Yanshu Li [view email]
[v1] Wed, 21 May 2025 05:22:21 UTC (9,487 KB)

Computer Science > Computation and Language

Title:TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators