Skip to main content

Showing 1–2 of 2 results for author: Pankayaraj, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2003.12968  [pdf, other

    cs.LG math.ST stat.ML

    A Decentralized Policy with Logarithmic Regret for a Class of Multi-Agent Multi-Armed Bandit Problems with Option Unavailability Constraints and Stochastic Communication Protocols

    Authors: Pathmanathan Pankayaraj, D. H. S. Maithripala, J. M. Berg

    Abstract: This paper considers a multi-armed bandit (MAB) problem in which multiple mobile agents receive rewards by sampling from a collection of spatially dispersed stochastic processes, called bandits. The goal is to formulate a decentralized policy for each agent, in order to maximize the total cumulative reward over all agents, subject to option availability and inter-agent communication constraints. T… ▽ More

    Submitted 31 March, 2020; v1 submitted 29 March, 2020; originally announced March 2020.

    Comments: Pre-print submitted for review to the 2020 CDC

  2. arXiv:1910.02635  [pdf, other

    cs.LG stat.ML

    A Decentralized Communication Policy for Multi Agent Multi Armed Bandit Problems

    Authors: Pathmanathan Pankayaraj, D. H. S. Maithripala

    Abstract: This paper proposes a novel policy for a group of agents to, individually as well as collectively, solve a multi armed bandit (MAB) problem. The policy relies solely on the information that an agent has obtained through sampling of the options on its own and through communication with neighbors. The option selection policy is based on an Upper Confidence Based (UCB) strategy while the communicatio… ▽ More

    Submitted 21 February, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: This is the full version of a preprint that will appear in the proceedings of the 2020 European Control Conference (ECC)