Dynamic Planning and Learning under Recovering Rewards

Simchi-Levi, David; Zheng, Zeyu; Zhu, Feng

Statistics > Machine Learning

arXiv:2106.14813v1 (stat)

[Submitted on 28 Jun 2021 (this version), latest version 21 Dec 2021 (v2)]

Title:Dynamic Planning and Learning under Recovering Rewards

Authors:David Simchi-Levi, Zeyu Zheng, Feng Zhu

View PDF

Abstract:Motivated by emerging applications such as live-streaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most $K$ out of $N$ different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over $T$ time periods, we propose, construct and prove performance guarantees for a class of "Purely Periodic Policies". For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of $1-\mathcal O(1/\sqrt{K})$, which is asymptotically optimal when $K$ grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has $\widetilde{\mathcal O}(N\sqrt{T})$ regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.

Comments:	Accepted by ICML 2021
Subjects:	Machine Learning (stat.ML); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2106.14813 [stat.ML]
	(or arXiv:2106.14813v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2106.14813

Submission history

From: Feng Zhu [view email]
[v1] Mon, 28 Jun 2021 15:40:07 UTC (356 KB)
[v2] Tue, 21 Dec 2021 22:57:43 UTC (785 KB)

Statistics > Machine Learning

Title:Dynamic Planning and Learning under Recovering Rewards

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Dynamic Planning and Learning under Recovering Rewards

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators