Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

Lan, Junwei; Chen, Jianlyu; Liu, Zheng; Li, Chaofan; Bao, Siqi; Lian, Defu

Computer Science > Information Retrieval

arXiv:2509.24869 (cs)

[Submitted on 29 Sep 2025 (v1), last revised 12 Oct 2025 (this version, v2)]

Title:Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

Authors:Junwei Lan, Jianlyu Chen, Zheng Liu, Chaofan Li, Siqi Bao, Defu Lian

View PDF HTML (experimental)

Abstract:With the growing popularity of LLM agents and RAG, it has become increasingly important to retrieve documents that are essential for solving a task, even when their connection to the task is indirect or implicit. Addressing this problem requires fine-grained reasoning to accurately assess the relevance between the task and each candidate document. This capability, however, poses a significant challenge for existing IR techniques. Despite recent progress in reasoning-enhanced IR, existing approaches still face significant challenges in applicability, scalability, and efficiency. In this work, we propose Retro*, a novel approach for reasoning-intensive document retrieval. Our method introduces a rubric-based relevance scoring mechanism, enabling the model to reason about the relationship between a task and a document based on explicitly defined criteria, whereby producing a fine-grained, interpretable relevance score. Retro* also supports test-time scaling by combining multiple reasoning trajectories via score integration, which produces more reliable relevance estimates. To optimize Retro*'s reasoning capabilities, we introduce a novel reinforcement learning algorithm tailored for its relevance scoring mechanism, which employs two composite rewards to fully exploit the trajectories of each training sample. Our experiments show that Retro* outperforms existing document retrieval methods with notable advantages, leading to state-of-the-art performance on the BRIGHT benchmark.

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2509.24869 [cs.IR]
	(or arXiv:2509.24869v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2509.24869

Submission history

From: Junwei Lan [view email]
[v1] Mon, 29 Sep 2025 14:53:05 UTC (319 KB)
[v2] Sun, 12 Oct 2025 09:37:17 UTC (319 KB)

Computer Science > Information Retrieval

Title:Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Retro*: Optimizing LLMs for Reasoning-Intensive Document Retrieval

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators