Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Xu, Ran; Chen, Jingjing; Ye, Jiayu; Wu, Yu; Yan, Jun; Yang, Carl; Yu, Hongkun

Computer Science > Computation and Language

arXiv:2510.23038 (cs)

[Submitted on 27 Oct 2025]

Title:Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Authors:Ran Xu, Jingjing Chen, Jiayu Ye, Yu Wu, Jun Yan, Carl Yang, Hongkun Yu

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) are widely used as judges to evaluate response quality, providing a scalable alternative to human evaluation. However, most LLM judges operate solely on intrinsic text-based reasoning, limiting their ability to verify complex constraints or perform accurate computation. Motivated by the success of tool-integrated reasoning (TIR) in numerous tasks, we propose TIR-Judge, an end-to-end RL framework for training LLM judges that integrates a code executor for precise evaluation. TIR-Judge is built on three principles: (i) diverse training across verifiable and non-verifiable domains, (ii) flexible judgment formats (pointwise, pairwise, listwise), and (iii) iterative RL that bootstraps directly from the initial model without distillation. On seven public benchmarks, TIR-Judge surpasses strong reasoning-based judges by up to 6.4% (pointwise) and 7.7% (pairwise), and achieves listwise performance comparable to Claude-Opus-4 despite having only 8B parameters. Remarkably, TIR-Judge-Zero - trained entirely without distilled judge trajectories, matches the performance of distilled variants, demonstrating that tool-augmented judges can self-evolve through iterative reinforcement learning.

Comments:	Work in Progress
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2510.23038 [cs.CL]
	(or arXiv:2510.23038v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2510.23038

Submission history

From: Ran Xu [view email]
[v1] Mon, 27 Oct 2025 06:03:37 UTC (812 KB)

Computer Science > Computation and Language

Title:Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Incentivizing Agentic Reasoning in LLM Judges via Tool-Integrated Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators