Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Bubeck, Sébastien; Li, Yuanzhi; Peres, Yuval; Sellke, Mark

Computer Science > Machine Learning

arXiv:1904.12233 (cs)

[Submitted on 28 Apr 2019 (v1), last revised 1 May 2019 (this version, v2)]

Title:Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Authors:Sébastien Bubeck, Yuanzhi Li, Yuval Peres, Mark Sellke

View PDF

Abstract:We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

Comments:	27 pages, v2 adds a pseudorandom generator construction to remove the shared randomness assumption in the $\sqrt{T}$-regret result (Section 3.9)
Subjects:	Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Cite as:	arXiv:1904.12233 [cs.LG]
	(or arXiv:1904.12233v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1904.12233

Submission history

From: Sebastien Bubeck [view email]
[v1] Sun, 28 Apr 2019 00:21:04 UTC (29 KB)
[v2] Wed, 1 May 2019 19:05:21 UTC (30 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2019-04

Change to browse by:

cs
cs.MA
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Sébastien Bubeck
Yuanzhi Li
Yuval Peres
Mark Sellke

export BibTeX citation

Computer Science > Machine Learning

Title:Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators