Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Koren, Uri; Kumar, Navdeep; Gadot, Uri; Ramponi, Giorgia; Levy, Kfir Yehuda; Mannor, Shie

Computer Science > Machine Learning

arXiv:2506.07054 (cs)

[Submitted on 8 Jun 2025]

Title:Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Authors:Uri Koren, Navdeep Kumar, Uri Gadot, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor

View PDF HTML (experimental)

Abstract:Classical policy gradient (PG) methods in reinforcement learning frequently converge to suboptimal local optima, a challenge exacerbated in large or complex environments. This work investigates Policy Gradient with Tree Search (PGTS), an approach that integrates an $m$-step lookahead mechanism to enhance policy optimization. We provide theoretical analysis demonstrating that increasing the tree search depth $m$-monotonically reduces the set of undesirable stationary points and, consequently, improves the worst-case performance of any resulting stationary policy. Critically, our analysis accommodates practical scenarios where policy updates are restricted to states visited by the current policy, rather than requiring updates across the entire state space. Empirical evaluations on diverse MDP structures, including Ladder, Tightrope, and Gridworld environments, illustrate PGTS's ability to exhibit "farsightedness," navigate challenging reward landscapes, escape local traps where standard PG fails, and achieve superior solutions.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2506.07054 [cs.LG]
	(or arXiv:2506.07054v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2506.07054

Submission history

From: Uri Koren [view email]
[v1] Sun, 8 Jun 2025 09:28:11 UTC (332 KB)

Computer Science > Machine Learning

Title:Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Policy Gradient with Tree Search: Avoiding Local Optimas through Lookahead

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators