Search | arXiv e-print repository

arXiv:2009.02605 [pdf, other]

PAC Reinforcement Learning Algorithm for General-Sum Markov Games

Authors: Ashkan Zehfroosh, Herbert G. Tanner

Abstract: This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL a… ▽ More This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL algorithm, the framework enables checking whether an arbitrary MARL algorithm is PAC. Comparative numerical results demonstrate performance and robustness. △ Less

Submitted 5 September, 2020; originally announced September 2020.

arXiv:1211.4038 [pdf, other]

Stochastic receding horizon control of nonlinear stochastic systems with probabilistic state constraints

Authors: Shridhar K. Shah, Herbert G. Tanner, Chetan D. Pahlajani

Abstract: The paper describes a receding horizon control design framework for continuous-time stochastic nonlinear systems subject to probabilistic state constraints. The intention is to derive solutions that are implementable in real-time on currently available mobile processors. The approach consists of decomposing the problem into designing receding horizon reference paths based on the drift component of… ▽ More The paper describes a receding horizon control design framework for continuous-time stochastic nonlinear systems subject to probabilistic state constraints. The intention is to derive solutions that are implementable in real-time on currently available mobile processors. The approach consists of decomposing the problem into designing receding horizon reference paths based on the drift component of the system dynamics, and then implementing a stochastic optimal controller to allow the system to stay close and follow the reference path. In some cases, the stochastic optimal controller can be obtained in closed form; in more general cases, pre-computed numerical solutions can be implemented in real-time without the need for on-line computation. The convergence of the closed loop system is established assuming no constraints on control inputs, and simulation results are provided to corroborate the theoretical predictions. △ Less

Submitted 16 November, 2012; originally announced November 2012.

Comments: Draft of submission to IEEE Transactions of Automatic Control

arXiv:1210.1464 [pdf, other]

Networked Decision Making for Poisson Processes: Application to nuclear detection

Authors: Chetan D. Pahlajani, Ioannis Poulakakis, Herbert G. Tanner

Abstract: This paper addresses a detection problem where several spatially distributed sensors independently observe a time-inhomogeneous stochastic process. The task is to decide between two hypotheses regarding the statistics of the observed process at the end of a fixed time interval. In the proposed method, each of the sensors transmits once to a fusion center a locally processed summary of its informat… ▽ More This paper addresses a detection problem where several spatially distributed sensors independently observe a time-inhomogeneous stochastic process. The task is to decide between two hypotheses regarding the statistics of the observed process at the end of a fixed time interval. In the proposed method, each of the sensors transmits once to a fusion center a locally processed summary of its information in the form of a likelihood ratio. The fusion center then combines these messages to arrive at an optimal decision in the Neyman-Pearson framework. The approach is motivated by applications arising in the detection of mobile radioactive sources, and offers a pathway toward the development of novel fixed- interval detection algorithms that combine decentralized processing with optimal centralized decision making. △ Less

Submitted 4 October, 2012; originally announced October 2012.

Showing 1–3 of 3 results for author: Tanner, H G