Lagrangian Index Policy for Restless Bandits with Average Reward

Avrachenkov, Konstantin; Borkar, Vivek S.; Shah, Pratik

Computer Science > Machine Learning

arXiv:2412.12641 (cs)

[Submitted on 17 Dec 2024 (v1), last revised 26 Jun 2025 (this version, v2)]

Title:Lagrangian Index Policy for Restless Bandits with Average Reward

Authors:Konstantin Avrachenkov, Vivek S. Borkar, Pratik Shah

View PDF HTML (experimental)

Abstract:We study the Lagrange Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions. Even though in most cases their performances are very similar, in the cases when WIP shows bad performance, LIP continues to perform very well. We then propose reinforcement learning algorithms, both tabular and NN-based, to obtain online learning schemes for LIP in the model-free setting. The proposed reinforcement learning schemes for LIP require significantly less memory than the analogous schemes for WIP. We calculate analytically the Lagrange index for the restart model, which applies to the optimal web crawling and the minimization of the weighted age of information. We also give a new proof of asymptotic optimality in case of homogeneous arms as the number of arms goes to infinity, based on exchangeability and de Finetti's theorem.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Probability (math.PR)
Cite as:	arXiv:2412.12641 [cs.LG]
	(or arXiv:2412.12641v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2412.12641

Submission history

From: Konstantin Avrachenkov [view email]
[v1] Tue, 17 Dec 2024 08:03:53 UTC (325 KB)
[v2] Thu, 26 Jun 2025 14:00:55 UTC (335 KB)

Computer Science > Machine Learning

Title:Lagrangian Index Policy for Restless Bandits with Average Reward

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Lagrangian Index Policy for Restless Bandits with Average Reward

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators