Skip to main content

Showing 1–3 of 3 results for author: Tanner, H G

Searching in archive math. Search in all archives.
.
  1. arXiv:2009.02605  [pdf, other

    cs.GT cs.LG math.OC

    PAC Reinforcement Learning Algorithm for General-Sum Markov Games

    Authors: Ashkan Zehfroosh, Herbert G. Tanner

    Abstract: This paper presents a theoretical framework for probably approximately correct (PAC) multi-agent reinforcement learning (MARL) algorithms for Markov games. The paper offers an extension to the well-known Nash Q-learning algorithm, using the idea of delayed Q-learning, in order to build a new PAC MARL algorithm for general-sum Markov games. In addition to guiding the design of a provably PAC MARL a… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

  2. arXiv:1211.4038  [pdf, other

    eess.SY cs.RO math.OC

    Stochastic receding horizon control of nonlinear stochastic systems with probabilistic state constraints

    Authors: Shridhar K. Shah, Herbert G. Tanner, Chetan D. Pahlajani

    Abstract: The paper describes a receding horizon control design framework for continuous-time stochastic nonlinear systems subject to probabilistic state constraints. The intention is to derive solutions that are implementable in real-time on currently available mobile processors. The approach consists of decomposing the problem into designing receding horizon reference paths based on the drift component of… ▽ More

    Submitted 16 November, 2012; originally announced November 2012.

    Comments: Draft of submission to IEEE Transactions of Automatic Control

  3. arXiv:1210.1464  [pdf, other

    math.PR cs.RO

    Networked Decision Making for Poisson Processes: Application to nuclear detection

    Authors: Chetan D. Pahlajani, Ioannis Poulakakis, Herbert G. Tanner

    Abstract: This paper addresses a detection problem where several spatially distributed sensors independently observe a time-inhomogeneous stochastic process. The task is to decide between two hypotheses regarding the statistics of the observed process at the end of a fixed time interval. In the proposed method, each of the sensors transmits once to a fusion center a locally processed summary of its informat… ▽ More

    Submitted 4 October, 2012; originally announced October 2012.