An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

Masoudian, Saeed; Zimmert, Julian; Seldin, Yevgeny

Computer Science > Machine Learning

arXiv:2308.10675v1 (cs)

[Submitted on 21 Aug 2023 (this version), latest version 27 May 2024 (v2)]

Title:An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

Authors:Saeed Masoudian, Julian Zimmert, Yevgeny Seldin

View PDF

Abstract:We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. The algorithm improves on prior work by Masoudian et al. [2022] by eliminating the need in prior knowledge of the maximal delay $d_{\mathrm{max}}$ and providing tighter regret bounds in both regimes. The algorithm and its regret bounds are based on counts of outstanding observations (a quantity that is observed at action time) rather than delays or the maximal delay (quantities that are only observed when feedback arrives). One major contribution is a novel control of distribution drift, which is based on biased loss estimators and skipping of observations with excessively large delays. Another major contribution is demonstrating that the complexity of best-of-both-worlds bandits with delayed feedback is characterized by the cumulative count of outstanding observations after skipping of observations with excessively large delays, rather than the delays or the maximal delay.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2308.10675 [cs.LG]
	(or arXiv:2308.10675v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2308.10675

Submission history

From: Saeed Masoudian [view email]
[v1] Mon, 21 Aug 2023 12:17:40 UTC (41 KB)
[v2] Mon, 27 May 2024 19:30:57 UTC (42 KB)

Computer Science > Machine Learning

Title:An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators