Showing 1–2 of 2 results for author: Raveh, O

Search v0.5.6 released 2020-02-24

arXiv:2405.16168 [pdf, other]

cs.LG stat.ML

Multi-Player Approaches for Dueling Bandits

Authors: Or Raveh, Junya Honda, Masashi Sugiyama

Abstract: Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow… ▽ More Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow Your Leader black-box approach matches the lower bound for this setting when utilizing known dueling bandit algorithms as a foundation. Additionally, we analyze a message-passing fully distributed approach with a novel Condorcet-winner recommendation protocol, resulting in expedited exploration in many cases. Our experimental comparisons reveal that our multiplayer algorithms surpass single-player benchmark algorithms, underscoring their efficacy in addressing the nuanced challenges of the multiplayer dueling bandit setting. △ Less

Submitted 23 April, 2025; v1 submitted 25 May, 2024; originally announced May 2024.
arXiv:1905.09951 [pdf, other]

cs.LG stat.ML

PAC Guarantees for Cooperative Multi-Agent Reinforcement Learning with Restricted Communication

Authors: Or Raveh, Ron Meir

Abstract: We develop model free PAC performance guarantees for multiple concurrent MDPs, extending recent works where a single learner interacts with multiple non-interacting agents in a noise free environment. Our framework allows noisy and resource limited communication between agents, and develops novel PAC guarantees in this extended setting. By allowing communication between the agents themselves, we s… ▽ More We develop model free PAC performance guarantees for multiple concurrent MDPs, extending recent works where a single learner interacts with multiple non-interacting agents in a noise free environment. Our framework allows noisy and resource limited communication between agents, and develops novel PAC guarantees in this extended setting. By allowing communication between the agents themselves, we suggest improved PAC-exploration algorithms that can overcome the communication noise and lead to improved sample complexity bounds. We provide a theoretically motivated algorithm that optimally combines information from the resource limited agents, thereby analyzing the interaction between noise and communication constraints that are ubiquitous in real-world systems. We present empirical results for a simple task that supports our theoretical formulations and improve upon naive information fusion methods. △ Less

Submitted 10 October, 2019; v1 submitted 23 May, 2019; originally announced May 2019.

Search v0.5.6 released 2020-02-24