MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

Liu, Weihao; Wu, Ning; Yang, Shiping; Ding, Wenbiao; Liang, Shining; Gong, Ming; Zhang, Dongmei

Computer Science > Computation and Language

arXiv:2502.13963 (cs)

[Submitted on 19 Feb 2025]

Title:MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

Authors:Weihao Liu, Ning Wu, Shiping Yang, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) frequently show distracted attention due to irrelevant information in the input, which severely impairs their long-context capabilities. Inspired by recent studies on the effectiveness of retrieval heads in long-context factutality, we aim at addressing this distraction issue through improving such retrieval heads directly. We propose Multi-Document Attention Focusing (MuDAF), a novel method that explicitly optimizes the attention distribution at the head level through contrastive learning. According to the experimental results, MuDAF can significantly improve the long-context question answering performance of LLMs, especially in multi-document question answering. Extensive evaluations on retrieval scores and attention visualizations show that MuDAF possesses great potential in making attention heads more focused on relevant information and reducing attention distractions.

Comments:	18 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2502.13963 [cs.CL]
	(or arXiv:2502.13963v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2502.13963

Submission history

From: Weihao Liu [view email]
[v1] Wed, 19 Feb 2025 18:59:15 UTC (1,300 KB)

Computer Science > Computation and Language

Title:MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators