Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Ba, Wenjia; Lin, Tianyi; Zhang, Jiawei; Zhou, Zhengyuan

Computer Science > Machine Learning

arXiv:2112.02856 (cs)

[Submitted on 6 Dec 2021 (v1), last revised 29 Mar 2024 (this version, v4)]

Title:Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Authors:Wenjia Ba, Tianyi Lin, Jiawei Zhang, Zhengyuan Zhou

View PDF HTML (experimental)

Abstract:We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of \textit{smooth and strongly monotone} games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $\tilde{\Theta}(n\sqrt{T})$ under smooth and strongly concave reward functions ($n \geq 1$ is the problem dimension). We then show that if each player applies this no-regret learning algorithm in strongly monotone games, the joint action converges in the \textit{last iterate} to the unique Nash equilibrium at a rate of $\tilde{\Theta}(nT^{-1/2})$. Prior to our work, the best-known convergence rate in the same class of games is $\tilde{O}(n^{2/3}T^{-1/3})$ (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is $\Omega(nT^{-1/2})$). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present preliminary numerical results on several application problems to demonstrate the efficacy of our algorithm in terms of iteration count.

Comments:	43 pages, 4 figures
Subjects:	Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
Cite as:	arXiv:2112.02856 [cs.LG]
	(or arXiv:2112.02856v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2112.02856

Submission history

From: Wenjia Ba [view email]
[v1] Mon, 6 Dec 2021 08:27:54 UTC (408 KB)
[v2] Wed, 8 Dec 2021 02:06:50 UTC (366 KB)
[v3] Sun, 10 Jul 2022 01:29:19 UTC (1,045 KB)
[v4] Fri, 29 Mar 2024 04:18:14 UTC (444 KB)

Computer Science > Machine Learning

Title:Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators