Search | arXiv e-print repository

Compositional Planning for Logically Constrained Multi-Agent Markov Decision Processes

Authors: Krishna C. Kalagarla, Matthew Low, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of C… ▽ More Designing control policies for large, distributed systems is challenging, especially in the context of critical, temporal logic based specifications (e.g., safety) that must be met with high probability. Compositional methods for such problems are needed for scalability, yet relying on worst-case assumptions for decomposition tends to be overly conservative. In this work, we use the framework of Constrained Markov Decision Processes (CMDPs) to provide an assume-guarantee based decomposition for synthesizing decentralized control policies, subject to logical constraints in a multi-agent setting. The returned policies are guaranteed to satisfy the constraints with high probability and provide a lower bound on the achieved objective reward. We empirically find the returned policies to achieve near-optimal rewards while enjoying an order of magnitude reduction in problem size and execution time. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: 6 pages, 1 figure, accepted for publication at the 63rd IEEE Conf. on Decision and Control (2024)

arXiv:2402.08813 [pdf, other]

Model approximation in MDPs with unbounded per-step cost

Authors: Berk Bozkurt, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Abstract: We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference… ▽ More We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference between the value function of $\hatπ^\star $ when used in $\mathcal{M}$ and the optimal value function of $\mathcal{M}$. We then extend our results and obtain potentially tighter upper bounds by considering affine transformations of the per-step cost. We further provide upper bounds that explicitly depend on the weighted distance between cost functions and weighted distance between transition kernels of the original and approximate models. We present examples to illustrate our results. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2310.10107 [pdf, other]

Posterior Sampling-based Online Learning for Episodic POMDPs

Authors: Dengwang Tang, Dongze Ye, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Learning in POMDPs is known to be significantly harder than in MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorith… ▽ More Learning in POMDPs is known to be significantly harder than in MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs. We show that the Bayesian regret of the proposed algorithm scales as the square root of the number of episodes and is polynomial in the other parameters. In a general setting, the regret scales exponentially in the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, when the POMDP is undercomplete and weakly revealing (a common assumption in the recent literature), we establish a polynomial Bayesian regret bound. We finally propose a posterior sampling algorithm for multi-agent POMDPs, and show it too has sublinear regret. △ Less

Submitted 23 October, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 41 pages, 9 figures

MSC Class: 93E35

arXiv:2305.14736 [pdf, other]

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Authors: Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we firs… ▽ More Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies. △ Less

Submitted 19 June, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2203.09038

arXiv:2304.04346 [pdf, other]

A Novel Point-based Algorithm for Multi-agent Control Using the Common Information Approach

Authors: Dengwang Tang, Ashutosh Nayyar, Rahul Jain

Abstract: The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's… ▽ More The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: 11 pages, 4 figures

MSC Class: 68T20 ACM Class: I.2.8; I.2.11

arXiv:2209.03888 [pdf, ps, other]

Optimal Communication and Control Strategies for a Multi-Agent System in the Presence of an Adversary

Authors: Dhruva Kartik, Sagar Sudhakara, Rahul Jain, Ashutosh Nayyar

Abstract: We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary ma… ▽ More We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary may eavesdrop on their communication. Thus, the agents in the team must effectively coordinate with each other while being robust to the adversary's malicious actions. We model this interaction between the team and the adversary as a stochastic zero-sum game where the team aims to minimize a cost while the adversary aims to maximize it. Under some assumptions on the adversary's capabilities, we characterize a min-max control and communication strategy for the team. We supplement this characterization with several structural results that can make the computation of the min-max strategy more tractable. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: In proceedings of Conference of Decision and Control (2022)

arXiv:2203.09038 [pdf, other]

Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints

Authors: Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent p… ▽ More Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent policies that maximize the reward while ensuring that the probability of satisfying the temporal logic specification is sufficiently high. We reformulate the problem as a constrained partially observable Markov decision process (POMDP) and provide a novel approach that can leverage off-the-shelf unconstrained POMDP solvers for solving it. Our approach guarantees approximate optimality and constraint satisfaction with high probability. We demonstrate its effectiveness by implementing it on several models of interest. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2108.08502 [pdf, ps, other]

A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems

Authors: Mukul Gagrani, Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Abstract: We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does… ▽ More We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $\tilde{\mathcal{O}}(\sqrt{T})$, where $T$ is the time-horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in~$T$. △ Less

Submitted 19 September, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

Journal ref: Proc 2022 IEEE Conference on Decision and Control

arXiv:2108.07970 [pdf, other]

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

Authors: Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Abstract: We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the… ▽ More We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: 12 pages

arXiv:2102.05838 [pdf, ps, other]

Common Information Belief based Dynamic Programs for Stochastic Zero-sum Games with Competing Teams

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we pr… ▽ More Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we provide bounds on the upper (min-max) and lower (max-min) values of the game. Furthermore, if the upper and lower values of the game are identical (i.e., if the game has a \emph{value}), our bounds coincide with the value of the game. Our bounds are obtained using two dynamic programs based on a sufficient statistic known as the common information belief (CIB). We also identify certain information structures in which only the minimizing team controls the evolution of the CIB. In these cases, we show that one of our CIB based dynamic programs can be used to find the min-max strategy (in addition to the min-max value). We propose an approximate dynamic programming approach for computing the values (and the strategy when applicable) and illustrate our results with the help of an example. △ Less

Submitted 27 September, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:1909.01445

arXiv:2011.04686 [pdf, other]

Thompson sampling for linear quadratic mean-field teams

Authors: Mukul Gagrani, Sagar Sudhakara, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Abstract: We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based… ▽ More We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of $|M|$ different types at time horizon $T$ is $\tilde{\mathcal{O}} \big( |M|^{1.5} \sqrt{T} \big)$ irrespective of the total number of agents, where the $\tilde{\mathcal{O}}$ notation hides logarithmic factors in $T$. We present detailed numerical experiments to illustrate the salient features of the proposed algorithm. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: Submitted to AISTATS 2021

arXiv:2007.03007 [pdf, ps, other]

Optimal Dynamic Mechanism Design with Stochastic Supply and Flexible Consumers

Authors: Shiva Navabi, Ashutosh Nayyar

Abstract: We consider the problem of designing an expected-revenue maximizing mechanism for allocating multiple non-perishable goods of $k$ varieties to flexible consumers over $T$ time steps. In our model, a random number of goods of each variety may become available to the seller at each time and a random number of consumers may enter the market at each time. Each consumer is present in the market for one… ▽ More We consider the problem of designing an expected-revenue maximizing mechanism for allocating multiple non-perishable goods of $k$ varieties to flexible consumers over $T$ time steps. In our model, a random number of goods of each variety may become available to the seller at each time and a random number of consumers may enter the market at each time. Each consumer is present in the market for one time step and wants to consume one good of one of its desired varieties. Each consumer is associated with a flexibility level that indicates the varieties of the goods it is equally interested in. A consumer's flexibility level and the utility it gets from consuming a good of its desired varieties are its private information. We characterize the allocation rule for a Bayesian incentive compatible, individually rational and expected revenue maximizing mechanism in terms of the solution to a dynamic program. The corresponding payment function is also specified in terms of the optimal allocation function. We leverage the structure of the consumers' flexibility model to simplify the dynamic program and provide an alternative description of the optimal mechanism in terms of thresholds computed by the dynamic program. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:1911.06912 [pdf, ps, other]

Fixed-horizon Active Hypothesis Testing

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: Two active hypothesis testing problems are formulated. In these problems, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The first problem is an asymmetric formulation in which the the objective is to minimize the probability of incorrectly declaring a particular hypothesis t… ▽ More Two active hypothesis testing problems are formulated. In these problems, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The first problem is an asymmetric formulation in which the the objective is to minimize the probability of incorrectly declaring a particular hypothesis to be true while ensuring that the probability of correctly declaring that hypothesis is moderately high. This formulation can be seen as a generalization of the formulation in the classical Chernoff-Stein lemma to an active setting. The second problem is a symmetric formulation in which the objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For these problems, lower and upper bounds on the optimal misclassification probabilities are derived and these bounds are shown to be asymptotically tight. Classical approaches for experiment selection suggest use of randomized and, in some cases, open-loop strategies. As opposed to these classical approaches, fully deterministic and adaptive experiment selection strategies are provided. It is shown that these strategies are asymptotically optimal and further, using numerical experiments, it is demonstrated that these novel experiment selection strategies (coupled with appropriate inference strategies) have a significantly better performance in the non-asymptotic regime. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: Submitted to IEEE Transactions on Automatic Control

arXiv:1909.01445 [pdf, ps, other]

Zero-sum Stochastic Games with Asymmetric Information

Authors: Dhruva Kartik, Ashutosh Nayyar

Abstract: A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player's information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If th… ▽ More A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player's information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If the value of the zero-sum game does not exist, then the dynamic program provides bounds on the upper and lower values of the game. This dynamic program is then used for a class of zero-sum stochastic games with complete information on one side and partial information on the other, that is, games where one player has complete information about state, actions and observation history while the other player may only have partial information about the state and action history. For such games, it is shown that the value exists and can be characterized using the dynamic program. It is further shown that for this class of games, the dynamic program can be used to compute an equilibrium strategy for the more informed player in which the player selects its action using its private information and the common information belief. △ Less

Submitted 24 December, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted for presentation at the 58th Conference on Decision and Control (CDC), 2019 Submitted to Dynamic Games and Applications

arXiv:1908.06070 [pdf, other]

Optimal scheduling strategy for networked estimation with energy harvesting

Authors: Marcos M. Vasconcelos, Mukul Gagrani, Ashutosh Nayyar, Urbashi Mitra

Abstract: Joint optimization of scheduling and estimation policies is considered for a system with two sensors and two non-collocated estimators. Each sensor produces an independent and identically distributed sequence of random variables, and each estimator forms estimates of the corresponding sequence with respect to the mean-squared error sense. The data generated by the sensors is transmitted to the cor… ▽ More Joint optimization of scheduling and estimation policies is considered for a system with two sensors and two non-collocated estimators. Each sensor produces an independent and identically distributed sequence of random variables, and each estimator forms estimates of the corresponding sequence with respect to the mean-squared error sense. The data generated by the sensors is transmitted to the corresponding estimators, over a bandwidth-constrained wireless network that can support a single packet per time slot. The access to the limited communication resources is determined by a scheduler who decides which sensor measurement to transmit based on both observations. The scheduler has an energy-harvesting battery of limited capacity, which couples the decision-making problem in time. Despite the overall lack of convexity of the team decision problem, it is shown that this system admits globally optimal scheduling and estimation strategies under the assumption that the distributions of the random variables at the sensors are symmetric and unimodal. Additionally, the optimal scheduling policy has a structure characterized by a threshold function that depends on the time index and energy level. A recursive algorithm for threshold computation is provided. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: 25 pages, 9 figures

arXiv:1902.03339 [pdf, other]

Worst-case Guarantees for Remote Estimation of an Uncertain Source

Authors: Mukul Gagrani, Yi Ouyang, Mohammad Rasouli, Ashutosh Nayyar

Abstract: Consider a remote estimation problem where a sensor wants to communicate the state of an uncertain source to a remote estimator over a finite time horizon. The uncertain source is modeled as an autoregressive process with bounded noise. Given that the sensor has a limited communication budget, the sensor must decide when to transmit the state to the estimator who has to produce real-time estimates… ▽ More Consider a remote estimation problem where a sensor wants to communicate the state of an uncertain source to a remote estimator over a finite time horizon. The uncertain source is modeled as an autoregressive process with bounded noise. Given that the sensor has a limited communication budget, the sensor must decide when to transmit the state to the estimator who has to produce real-time estimates of the source state. In this paper, we consider the problem of finding a scheduling strategy for the sensor and an estimation strategy for the estimator to jointly minimize the worst-case maximum instantaneous estimation error over the time horizon. This leads to a decentralized minimax decision-making problem. We obtain a complete characterization of optimal strategies for this decentralized minimax problem. In particular, we show that an open loop communication scheduling strategy is optimal and the optimal estimate depends only on the most recently received sensor observation. △ Less

Submitted 8 February, 2019; originally announced February 2019.

arXiv:1806.06497 [pdf, other]

Optimal Infinite Horizon Decentralized Networked Controllers with Unreliable Communication

Authors: Yi Ouyang, Seyed Mohammad Asghari, Ashutosh Nayyar

Abstract: We consider a decentralized networked control system (DNCS) consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. The downlink channels from the remote controller to local co… ▽ More We consider a decentralized networked control system (DNCS) consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. The downlink channels from the remote controller to local controllers were assumed to be perfect. The objective of the local controllers and the remote controller is to cooperatively minimize the infinite horizon time average of expected quadratic cost. The finite horizon version of this problem was solved in our prior work [2]. The optimal strategies in the finite horizon case were shown to be characterized by coupled Riccati recursions. In this paper, we show that if the link failure probabilities are below certain critical thresholds, then the coupled Riccati recursions of the finite horizon solution reach a steady state and the corresponding decentralized strategies are optimal. Above these thresholds, we show that no strategy can achieve finite cost. We exploit a connection between our DNCS Riccati recursions and the coupled Riccati recursions of an auxiliary Markov jump linear system to obtain our results. Our main results in Theorems 1 and 2 explicitly identify the critical thresholds for the link failure probabilities and the optimal decentralized control strategies when all link failure probabilities are below their thresholds. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Comments: 52 pages, Submitted to IEEE Transactions on Automatic Control

arXiv:1802.00538 [pdf, ps, other]

Decentralized Control of Stochastically Switched Linear System with Unreliable Communication

Authors: Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar

Abstract: We consider a networked control system (NCS) consisting of two plants, a global plant and a local plant, and two controllers, a global controller and a local controller. The global (resp. local) plant follows discrete-time stochastically switched linear dynamics with a continuous global (resp. local) state and a discrete global (resp. local) mode. We assume that the state and mode of the global pl… ▽ More We consider a networked control system (NCS) consisting of two plants, a global plant and a local plant, and two controllers, a global controller and a local controller. The global (resp. local) plant follows discrete-time stochastically switched linear dynamics with a continuous global (resp. local) state and a discrete global (resp. local) mode. We assume that the state and mode of the global plant are observed by both controllers while the state and mode of the local plant are only observed by the local controller. The local controller can inform the global controller of the local plant's state and mode through an unreliable TCP-like communication channel where successful transmissions are acknowledged. The objective of the controllers is to cooperatively minimize a modes-dependent quadratic cost over a finite time horizon. Following the method developed in [1] and [2], we construct a dynamic program based on common information and a decomposition of strategies, and use it to obtain explicit optimal strategies for the controllers. In the optimal strategies, both controllers compute a common estimate of the local plant's state. The global controller's action is linear in the state of the global plant and the common estimated state, and the local controller's action is linear in the actual states of both plants and the common estimated state. Furthermore, the gain matrices for the global controller depend on the global mode and its observation about the local mode, while the gain matrices for the local controller depend on the actual modes of both plants and the global controller's observation about the local mode. △ Less

Submitted 1 February, 2018; originally announced February 2018.

Comments: [Extended Version] Accepted for presentation in IEEE American Conference on Control (ACC) 2018

arXiv:1611.07175 [pdf, other]

Optimal Local and Remote Controllers with Unreliable Uplink Channels

Authors: Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar

Abstract: We consider a networked control system consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. We assume that the downlink channels from the remote controller to local controll… ▽ More We consider a networked control system consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. We assume that the downlink channels from the remote controller to local controllers are perfect. The objective of the local controllers and the remote controller is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested problem, we obtain explicit optimal strategies for all controllers. In the optimal strategies, all controllers compute common estimates of the states of the plants based on the common information obtained from the communication network. The remote controller's action is linear in the common state estimates, and the action of each local controller is linear in both the actual state of its co-located plant and the common state estimates. We illustrate our results with numerical experiments using randomly generated models. △ Less

Submitted 16 June, 2018; v1 submitted 22 November, 2016; originally announced November 2016.

Comments: 43 pages, Accepted for publication in IEEE Transactions on Automatic Control

arXiv:1611.03592 [pdf, ps, other]

Dynamic Teams and Decentralized Control Problems with Substitutable Actions

Authors: Seyed Mohammad Asghari, Ashutosh Nayyar

Abstract: This paper considers two problems -- a dynamic team problem and a decentralized control problem. The problems we consider do not belong to the known classes of "simpler" dynamic team/decentralized control problems such as partially nested or quadratically invariant problems. However, we show that our problems admit simple solutions under an assumption referred to as the substitutability assumption… ▽ More This paper considers two problems -- a dynamic team problem and a decentralized control problem. The problems we consider do not belong to the known classes of "simpler" dynamic team/decentralized control problems such as partially nested or quadratically invariant problems. However, we show that our problems admit simple solutions under an assumption referred to as the substitutability assumption. Intuitively, substitutability in a team (resp. decentralized control) problem means that the effects of one team member's (resp. controller's) action on the cost function and the information (resp. state dynamics) can be achieved by an action of another member (resp. controller). For the non-partially-nested LQG dynamic team problem, it is shown that under certain conditions linear strategies are optimal. For the non-partially-nested decentralized LQG control problem, the state structure can be exploited to obtain optimal control strategies with recursively update-able sufficient statistics. These results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized control problems. △ Less

Submitted 11 November, 2016; originally announced November 2016.

Comments: 25 pages, Accepted for publication in IEEE Transactions on Automatic Control

arXiv:1606.07215 [pdf, other]

Optimal Local and Remote Controllers with Unreliable Communication

Authors: Yi Ouyang, Seyed Mohammad Asghari, Ashutosh Nayyar

Abstract: We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal th… ▽ More We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal the successful receipt of transmitted packets. The objective of the two controllers is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested LQG problem, we obtain explicit optimal strategies for the two controllers. In the optimal strategies, both controllers compute a common estimate of the plant state based on the common information. The remote controller's action is linear in the common estimated state, and the local controller's action is linear in both the actual state and the common estimated state. △ Less

Submitted 23 June, 2016; originally announced June 2016.

arXiv:1601.02250 [pdf, ps, other]

Decentralized Control Problems with Substitutable Actions

Authors: Seyed Mohammad Asghari, Ashutosh Nayyar

Abstract: We consider a decentralized system with multiple controllers and define substitutability of one controller by another in open-loop strategies. We explore the implications of this property on the optimization of closed-loop strategies. In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to the known classes of "si… ▽ More We consider a decentralized system with multiple controllers and define substitutability of one controller by another in open-loop strategies. We explore the implications of this property on the optimization of closed-loop strategies. In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to the known classes of "simpler" decentralized problems such as partially nested or quadratically invariant problems, our results show that, under the substitutability assumption, linear strategies are optimal and we provide a complete state space characterization of optimal strategies. We also identify a family of information structures that all give the same optimal cost as the centralized information structure under the substitutability assumption. Our results suggest that open-loop substitutability can work as a counterpart of the information structure requirements that enable simplification of decentralized control problems. △ Less

Submitted 10 January, 2016; originally announced January 2016.

arXiv:1409.7034 [pdf, ps, other]

Rate-constrained Energy Services: Allocation Policies and Market Decisions

Authors: Ashutosh Nayyar, Matias Negrete-Pincetic, Kameshwar Poolla, Pravin Varaiya

Abstract: The integration of renewable generation poses operational and economic challenges for the electricity grid. For the core problem of power balance, the legacy paradigm of tailoring supply to follow random demand may be inappropriate under deep penetration of uncertain and intermittent renewable generation. In this situation, there is an emerging consensus that the alternative approach of controllin… ▽ More The integration of renewable generation poses operational and economic challenges for the electricity grid. For the core problem of power balance, the legacy paradigm of tailoring supply to follow random demand may be inappropriate under deep penetration of uncertain and intermittent renewable generation. In this situation, there is an emerging consensus that the alternative approach of controlling demand to follow random supply offers compelling economic benefits in terms of reduced regulation costs. This approach exploits the flexibility of demand side resources and requires sensing, actuation, and communication infrastructure; distributed control algorithms; and viable schemes to compensate participating loads. This paper considers rate-constrained energy services which are a specific paradigm for flexible demand. These services are characterized by a specified delivery window, the total amount of energy that must be supplied over this window, and the maximum rate at which this energy may be delivered. We consider a forward market where rate-constrained energy services are traded. We explore allocation policies and market decisions of a supplier in this market. The supplier owns a generation mix that includes some uncertain renewable generation and may also purchase energy in day-ahead and real-time markets to meet customer demand. The supplier must optimally select the portfolio of rate-constrained services to sell, the amount of day-ahead energy to buy, and the policies for making real-time energy purchases and allocations to customers to maximize its expected profit. We offer solutions to the supplier's decision and control problems to economically provide rate constrained energy services. △ Less

Submitted 24 September, 2014; originally announced September 2014.

arXiv:1408.5825 [pdf, other]

Duration-differentiated Energy Services with a Continuum of Loads

Authors: Ashutosh Nayyar, Matias Negrete-Pincetic, Kameshwar Poolla, Pravin Varaiya

Abstract: As the proportion of total power supplied by renewable sources increases, it gets more costly to use reserve generation to compensate for the variability of renewables like solar and wind. Hence attention has been drawn to exploiting flexibility in demand as a substitute for reserve generation. Flexibility has different attributes. In this paper we consider loads requiring a constant power for a s… ▽ More As the proportion of total power supplied by renewable sources increases, it gets more costly to use reserve generation to compensate for the variability of renewables like solar and wind. Hence attention has been drawn to exploiting flexibility in demand as a substitute for reserve generation. Flexibility has different attributes. In this paper we consider loads requiring a constant power for a specified duration (within say one day), whose flexibility resides in the fact that power may be delivered at any time so long as the total duration of service equals the load's specified duration. We give conditions under which a variable power supply is adequate to meet these flexible loads, and describe how to allocate the power to the loads. We also characterize the additional power needed when the supply is inadequate. We study the problem of allocating the available power to loads to maximize welfare, and show that the welfare optimum can be sustained as a competitive equilibrium in a forward market in which electricity is sold as service contracts differentiated by the duration of service and power level. We compare this forward market with a spot market in their ability to capture the flexiblity inherent in duration-differentiated loads. △ Less

Submitted 25 August, 2014; originally announced August 2014.

arXiv:1408.2551 [pdf, ps, other]

Optimal Control for LQG Systems on Graphs---Part I: Structural Results

Authors: Ashutosh Nayyar, Laurent Lessard

Abstract: In this two-part paper, we identify a broad class of decentralized output-feedback LQG systems for which the optimal control strategies have a simple intuitive estimation structure and can be computed efficiently. Roughly, we consider the class of systems for which the coupling of dynamics among subsystems and the inter-controller communication is characterized by the same directed graph. Furtherm… ▽ More In this two-part paper, we identify a broad class of decentralized output-feedback LQG systems for which the optimal control strategies have a simple intuitive estimation structure and can be computed efficiently. Roughly, we consider the class of systems for which the coupling of dynamics among subsystems and the inter-controller communication is characterized by the same directed graph. Furthermore, this graph is assumed to be a multitree, that is, its transitive reduction can have at most one directed path connecting each pair of nodes. In this first part, we derive sufficient statistics that may be used to aggregate each controller's growing available information. Each controller must estimate the states of the subsystems that it affects (its descendants) as well as the subsystems that it observes (its ancestors). The optimal control action for a controller is a linear function of the estimate it computes as well as the estimates computed by all of its ancestors. Moreover, these state estimates may be updated recursively, much like a Kalman filter. △ Less

Submitted 11 August, 2014; originally announced August 2014.

arXiv:1404.1112 [pdf, other]

Duration-Differentiated Services in Electricity

Authors: Ashutosh Nayyar, Matias Negrete-Pincetic, Kameshwar Poolla, Pravin Varaiya

Abstract: The integration of renewable sources poses challenges at the operational and economic levels of the power grid. In terms of keeping the balance between supply and demand, the usual scheme of supply following load may not be appropriate for large penetration levels of uncertain and intermittent renewable supply. In this paper, we focus on an alternative scheme in which the load follows the supply,… ▽ More The integration of renewable sources poses challenges at the operational and economic levels of the power grid. In terms of keeping the balance between supply and demand, the usual scheme of supply following load may not be appropriate for large penetration levels of uncertain and intermittent renewable supply. In this paper, we focus on an alternative scheme in which the load follows the supply, exploiting the flexibility associated with the demand side. We consider a model of flexible loads that are to be serviced by zero-marginal cost renewable power together with conventional generation if necessary. Each load demands 1 kW for a specified number of time slots within an operational period. The flexibility of a load resides in the fact that the service may be delivered over any slots within the operational period. Loads therefore require flexible energy services that are differentiated by the demanded duration. We focus on two problems associated with durations-differentiated loads. The first problem deals with the operational decisions that a supplier has to make to serve a given set of duration differentiated loads. The second problem focuses on a market implementation for duration differentiated services. We give necessary and sufficient conditions under which the available power can service the loads, and we describe an algorithm that constructs an appropriate allocation. In the event the available supply is inadequate, we characterize the minimum amount of power that must be purchased to service the loads. Next we consider a forward market where consumers can purchase duration differentiated energy services. We first characterize social welfare maximizing allocations in this forward market and then show the existence of an efficient competitive equilibrium. △ Less

Submitted 3 April, 2014; originally announced April 2014.

arXiv:1403.3126 [pdf, ps, other]

Signaling in sensor networks for sequential detection

Authors: Ashutosh Nayyar, Demosthenis Teneketzis

Abstract: Sequential detection problems in sensor networks are considered. The true state of nature/true hypothesis is modeled as a binary random variable $H$ with known prior distribution. There are $N$ sensors making noisy observations about the hypothesis; $\mathcal{N} =\{1,2,\ldots,N\}$ denotes the set of sensors. Sensor $i$ can receive messages from a subset $\mathcal{P}^i \subset \mathcal{N}$ of senso… ▽ More Sequential detection problems in sensor networks are considered. The true state of nature/true hypothesis is modeled as a binary random variable $H$ with known prior distribution. There are $N$ sensors making noisy observations about the hypothesis; $\mathcal{N} =\{1,2,\ldots,N\}$ denotes the set of sensors. Sensor $i$ can receive messages from a subset $\mathcal{P}^i \subset \mathcal{N}$ of sensors and send a message to a subset $\mathcal{C}^i \subset \mathcal{N}$. Each sensor is faced with a stopping problem. At each time $t$, based on the observations it has taken so far and the messages it may have received, sensor $i$ can decide to stop and communicate a binary decision to the sensors in $\mathcal{C}^i$, or it can continue taking observations and receiving messages. After sensor $i$'s binary decision has been sent, it becomes inactive. Sensors incur operational costs (cost of taking observations, communication costs etc.) while they are active. In addition, the system incurs a terminal cost that depends on the true hypothesis $H$, the sensors' binary decisions and their stopping times. The objective is to determine decision strategies for all sensors to minimize the total expected cost. △ Less

Submitted 12 March, 2014; originally announced March 2014.

Comments: 10 pages

arXiv:1403.2739 [pdf, other]

Sufficient statistics for linear control strategies in decentralized systems with partial history sharing

Authors: Aditya Mahajan, Ashutosh Nayyar

Abstract: In decentralized control systems with linear dynamics, quadratic cost, and Gaussian disturbance (also called decentralized LQG systems) linear control strategies are not always optimal. Nonetheless, linear control strategies are appealing due to analytic and implementation simplicity. In this paper, we investigate decentralized LQG systems with partial history sharing information structure and ide… ▽ More In decentralized control systems with linear dynamics, quadratic cost, and Gaussian disturbance (also called decentralized LQG systems) linear control strategies are not always optimal. Nonetheless, linear control strategies are appealing due to analytic and implementation simplicity. In this paper, we investigate decentralized LQG systems with partial history sharing information structure and identify finite dimensional sufficient statistics for such systems. Unlike prior work on decentralized LQG systems, we do not assume partially nestedness or quadratic invariance. Our approach is based on the common information approach of Nayyar \emph{et al}, 2013 and exploits the linearity of the system dynamics and control strategies. To illustrate our methodology, we identify sufficient statistics for linear strategies in decentralized systems where controllers communicate over a strongly connected graph with finite delays, and for decentralized systems consisting of coupled subsystems with control sharing or one-sided one step delay sharing information structures. △ Less

Submitted 11 March, 2014; originally announced March 2014.

arXiv:1401.4786 [pdf, ps, other]

Common Information based Markov Perfect Equilibria for Linear-Gaussian Games with Asymmetric Information

Authors: Abhishek Gupta, Ashutosh Nayyar, Cedric Langbort, Tamer Basar

Abstract: We consider a class of two-player dynamic stochastic nonzero-sum games where the state transition and observation equations are linear, and the primitive random variables are Gaussian. Each controller acquires possibly different dynamic information about the state process and the other controller's past actions and observations. This leads to a dynamic game of asymmetric information among the cont… ▽ More We consider a class of two-player dynamic stochastic nonzero-sum games where the state transition and observation equations are linear, and the primitive random variables are Gaussian. Each controller acquires possibly different dynamic information about the state process and the other controller's past actions and observations. This leads to a dynamic game of asymmetric information among the controllers. Building on our earlier work on finite games with asymmetric information, we devise an algorithm to compute a Nash equilibrium by using the common information among the controllers. We call such equilibria common information based Markov perfect equilibria of the game, which can be viewed as a refinement of Nash equilibrium in games with asymmetric information. If the players' cost functions are quadratic, then we show that under certain conditions a unique common information based Markov perfect equilibrium exists. Furthermore, this equilibrium can be computed by solving a sequence of linear equations. We also show through an example that there could be other Nash equilibria in a game of asymmetric information, not corresponding to common information based Markov perfect equilibria. △ Less

Submitted 19 January, 2014; originally announced January 2014.

Comments: Submitted to SIAM Journal of Control and Optimization

arXiv:1303.3256 [pdf, other]

Structural Results and Explicit Solution for Two-Player LQG Systems on a Finite Time Horizon

Authors: Laurent Lessard, Ashutosh Nayyar

Abstract: It is well-known that linear dynamical systems with Gaussian noise and quadratic cost (LQG) satisfy a separation principle. Finding the optimal controller amounts to solving separate dual problems; one for control and one for estimation. For the discrete-time finite-horizon case, each problem is a simple forward or backward recursion. In this paper, we consider a generalization of the LQG problem… ▽ More It is well-known that linear dynamical systems with Gaussian noise and quadratic cost (LQG) satisfy a separation principle. Finding the optimal controller amounts to solving separate dual problems; one for control and one for estimation. For the discrete-time finite-horizon case, each problem is a simple forward or backward recursion. In this paper, we consider a generalization of the LQG problem in which there are two controllers. Each controller is responsible for one of two system inputs, but has access to different subsets of the available measurements. Our paper has three main contributions. First, we prove a fundamental structural result: sufficient statistics for the controllers can be expressed as conditional means of the global state. Second, we give explicit state-space formulae for the optimal controller. These formulae are reminiscent of the classical LQG solution with dual forward and backward recursions, but with the important difference that they are intricately coupled. Lastly, we show how these recursions can be solved efficiently, with computational complexity comparable to that of the centralized problem. △ Less

Submitted 6 September, 2013; v1 submitted 13 March, 2013; originally announced March 2013.

arXiv:1209.3549 [pdf, ps, other]

Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games

Authors: Ashutosh Nayyar, Abhishek Gupta, Cédric Langbort, Tamer Başar

Abstract: A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric in… ▽ More A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric information is shown to be equivalent to another game with symmetric information. Further, under certain conditions, a Markov state is identified for the equivalent symmetric information game and its Markov perfect equilibria are characterized. This characterization provides a backward induction algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class of Nash equilibria of the original game that can be characterized in this backward manner are named common information based Markov perfect equilibria. △ Less

Submitted 17 September, 2012; originally announced September 2012.

arXiv:1209.1695 [pdf, other]

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Authors: Ashutosh Nayyar, Aditya Mahajan, Demosthenis Teneketzis

Abstract: A general model of decentralized stochastic control called partial history sharing information structure is presented. In this model, at each step the controllers share part of their observation and control history with each other. This general model subsumes several existing models of information sharing as special cases. Based on the information commonly known to all the controllers, the decentr… ▽ More A general model of decentralized stochastic control called partial history sharing information structure is presented. In this model, at each step the controllers share part of their observation and control history with each other. This general model subsumes several existing models of information sharing as special cases. Based on the information commonly known to all the controllers, the decentralized problem is reformulated as an equivalent centralized problem from the perspective of a coordinator. The coordinator knows the common information and select prescriptions that map each controller's local information to its control actions. The optimal control problem at the coordinator is shown to be a partially observable Markov decision process (POMDP) which is solved using techniques from Markov decision theory. This approach provides (a) structural results for optimal strategies, and (b) a dynamic program for obtaining optimal strategies for all controllers in the original decentralized problem. Thus, this approach unifies the various ad-hoc approaches taken in the literature. In addition, the structural results on optimal control strategies obtained by the proposed approach cannot be obtained by the existing generic approach (the person-by-person approach) for obtaining structural results in decentralized problems; and the dynamic program obtained by the proposed approach is simpler than that obtained by the existing generic approach (the designer's approach) for obtaining dynamic programs in decentralized problems. △ Less

Submitted 8 September, 2012; originally announced September 2012.

Comments: 37 pages, 1 figure

arXiv:1205.6018 [pdf, ps, other]

Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor

Authors: Ashutosh Nayyar, Tamer Basar, Demosthenis Teneketzis, Venugopal V. Veeravalli

Abstract: We consider a remote estimation problem with an energy harvesting sensor and a remote estimator. The sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multi-dimensional linear Gaussian system. It harvests energy from its environment (say, for example, through a solar cell) and uses this energy for the purpose of communicating with the estimator. Due… ▽ More We consider a remote estimation problem with an energy harvesting sensor and a remote estimator. The sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multi-dimensional linear Gaussian system. It harvests energy from its environment (say, for example, through a solar cell) and uses this energy for the purpose of communicating with the estimator. Due to the randomness of energy available for communication, the sensor may not be able to communicate all the time. The sensor may also want to save its energy for future communications. The estimator relies on messages communicated by the sensor to produce real-time estimates of the source state. We consider the problem of finding a communication scheduling strategy for the sensor and an estimation strategy for the estimator that jointly minimize an expected sum of communication and distortion costs over a finite time horizon. Our goal of joint optimization leads to a decentralized decision-making problem. By viewing the problem from the estimator's perspective, we obtain a dynamic programming characterization for the decentralized decision-making problem that involves optimization over functions. Under some symmetry assumptions on the source statistics and the distortion metric, we show that an optimal communication strategy is described by easily computable thresholds and that the optimal estimate is a simple function of the most recently received sensor observation. △ Less

Submitted 27 May, 2012; originally announced May 2012.

Comments: 32 pages

Showing 1–33 of 33 results for author: Nayyar, A