A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

Hu, Zicheng; Chen, Cheng

Computer Science > Machine Learning

arXiv:2502.07514 (cs)

[Submitted on 11 Feb 2025]

Title:A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

Authors:Zicheng Hu, Cheng Chen

View PDF HTML (experimental)

Abstract:We investigate various stochastic bandit problems in the presence of adversarial corruption. A seminal contribution to this area is the BARBAR~\citep{gupta2019better} algorithm, which is both simple and efficient, tolerating significant levels of corruption with nearly no degradation in performance. However, its regret upper bound exhibits a complexity of $O(KC)$, while the lower bound is $\Omega(C)$. In this paper, we enhance the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ and achieves an optimal regret bound up to a logarithmic factor. We also demonstrate how BARBAT can be extended to various settings, including graph bandits, combinatorial semi-bandits, batched bandits and multi-agent bandits. In comparison to the Follow-The-Regularized-Leader (FTRL) family of methods, which provide a best-of-both-worlds guarantee, our approach is more efficient and parallelizable. Notably, FTRL-based methods face challenges in scaling to batched and multi-agent settings.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2502.07514 [cs.LG]
	(or arXiv:2502.07514v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2502.07514

Submission history

From: Zicheng Hu [view email]
[v1] Tue, 11 Feb 2025 12:33:33 UTC (50 KB)

Computer Science > Machine Learning

Title:A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Near-optimal, Scalable and Corruption-tolerant Framework for Stochastic Bandits: From Single-Agent to Multi-Agent and Beyond

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators