Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Guo, Shangmin; Domingues, Omar Darwiche; Avalos, Raphaël; Courville, Aaron; Strub, Florian

Computer Science > Artificial Intelligence

arXiv:2506.02918 (cs)

[Submitted on 3 Jun 2025]

Title:Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Authors:Shangmin Guo, Omar Darwiche Domingues, Raphaël Avalos, Aaron Courville, Florian Strub

View PDF

Abstract:Tool use in stateful environments presents unique challenges for large language models (LLMs), where existing test-time compute strategies relying on repeated trials in the environment are impractical. We propose dynamics modelling (DyMo), a method that augments LLMs with a state prediction capability alongside function calling during post-training. This enables LLMs to predict the future states of their actions through an internal environment model. On the Berkeley Function Calling Leaderboard V2, DyMo improves success rates and significantly reduces hallucinations. We further integrate the internal environment model into self-verification sampling (SVS), and show that this substantially improves pass^k over number of trials k, and allows the model to refuse unreliable outputs. Together, DyMo and SVS greatly enhance the effectiveness and reliability of LLMs for tool use. We believe this work charts a path towards scalable planning RL methods for LLM inference without repeatedly querying the oracle environment.

Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2506.02918 [cs.AI]
	(or arXiv:2506.02918v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2506.02918

Submission history

From: Shangmin Guo [view email]
[v1] Tue, 3 Jun 2025 14:20:59 UTC (20,960 KB)

Computer Science > Artificial Intelligence

Title:Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators