-
Markov Potential Game with Final-time Reach-Avoid Objectives
Authors:
Sarah H. Q. Li,
Abraham P. Vinod
Abstract:
We formulate a Markov potential game with final-time reach-avoid objectives by integrating potential game theory with stochastic reach-avoid control. Our focus is on multi-player trajectory planning where players maximize the same multi-player reach-avoid objective: the probability of all participants reaching their designated target states by a specified time, while avoiding collisions with one a…
▽ More
We formulate a Markov potential game with final-time reach-avoid objectives by integrating potential game theory with stochastic reach-avoid control. Our focus is on multi-player trajectory planning where players maximize the same multi-player reach-avoid objective: the probability of all participants reaching their designated target states by a specified time, while avoiding collisions with one another. Existing approaches require centralized computation of actions via a global policy, which may have prohibitively expensive communication costs. Instead, we focus on approximations of the global policy via local state feedback policies. First, we adapt the recursive single player reach-avoid value iteration to the multi-player framework with local policies, and show that the same recursion holds on the joint state space. To find each player's optimal local policy, the multi-player reach-avoid value function is projected from the joint state to the local state using the other players' occupancy measures. Then, we propose an iterative best response scheme for the multi-player value iteration to converge to a pure Nash equilibrium. We demonstrate the utility of our approach in finding collision-free policies for multi-player motion planning in simulation.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Mitigating Transient Bullwhip Effects Under Imperfect Demand Forecasts
Authors:
Sarah H. Q. Li,
Florian Dörfler
Abstract:
Motivated by how forecast errors exacerbate order fluctuations in supply chains, we leverage robust feedback controller synthesis to characterize, compute, and minimize the worst-case order fluctuation experienced by an individual supply chain vendor. Assuming bounded forecast errors and demand fluctuations, we model forecast error and demand fluctuations as inputs to linear inventory dynamics, an…
▽ More
Motivated by how forecast errors exacerbate order fluctuations in supply chains, we leverage robust feedback controller synthesis to characterize, compute, and minimize the worst-case order fluctuation experienced by an individual supply chain vendor. Assuming bounded forecast errors and demand fluctuations, we model forecast error and demand fluctuations as inputs to linear inventory dynamics, and use the $\ell_\infty$ gain to define a transient Bullwhip measure. In contrast to the existing Bullwhip measure, the transient Bullwhip measure explicitly depends on the forecast error. This enables us to separately quantify the transient Bullwhip measure's sensitivity to forecast error and demand fluctuations. To compute the controller that minimizes the worst-case peak gain, we formulate an optimization problem with bilinear matrix inequalities and show that it is equivalent to minimizing a quasi-convex function on a bounded domain. We simulate our model for vendors with non-zero perishable rates and order backlogging rates, and prove that the transient Bullwhip measure can be bounded by a monotonic quasi-convex function whose dependency on the product backlog rate and perishing rate is verified in simulation.
△ Less
Submitted 12 September, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
A Coupled Optimization Framework for Correlated Equilibria in Normal-Form Game
Authors:
Sarah H. Q. Li,
Yue Yu,
Florian Dörfler,
John Lygeros
Abstract:
In competitive multi-player interactions, simultaneous optimality is a key requirement for establishing strategic equilibria. This property is explicit when the game-theoretic equilibrium is the simultaneously optimal solution of coupled optimization problems. However, no such optimization problems exist for the correlated equilibrium, a strategic equilibrium where the players can correlate their…
▽ More
In competitive multi-player interactions, simultaneous optimality is a key requirement for establishing strategic equilibria. This property is explicit when the game-theoretic equilibrium is the simultaneously optimal solution of coupled optimization problems. However, no such optimization problems exist for the correlated equilibrium, a strategic equilibrium where the players can correlate their actions. We address the lack of a coupled optimization framework for the correlated equilibrium by introducing an {unnormalized game} -- an extension of normal-form games in which the player strategies are lifted to unnormalized measures over the joint actions. We show that the set of fully mixed generalized Nash equilibria of this unnormalized game is a subset of the correlated equilibrium of the normal-form game. Furthermore, we introduce an entropy regularization to the unnormalized game and prove that the entropy-regularized generalized Nash equilibrium is a sub-optimal correlated equilibrium of the normal form game where the degree of sub-optimality depends on the magnitude of regularization. We prove that the entropy-regularized unnormalized game has a closed-form solution, and empirically verify its computational efficacy at approximating the correlated equilibrium of normal-form games.
△ Less
Submitted 3 April, 2024; v1 submitted 24 March, 2024;
originally announced March 2024.
-
Reducing Collision Risk in Multi-Agent Path Planning: Application to Air traffic Management
Authors:
Sarah H. Q. Li,
Avi Mittal,
Pierre-Loïc Garoche,
Açıkmeşe,
Behçet
Abstract:
To minimize collision risks in the multi-agent path planning problem with stochastic transition dynamics, we formulate a Markov decision process congestion game with a multi-linear congestion cost. Players within the game complete individual tasks while minimizing their own collision risks. We show that the set of Nash equilibria coincides with the first-order KKT points of a non-convex optimizati…
▽ More
To minimize collision risks in the multi-agent path planning problem with stochastic transition dynamics, we formulate a Markov decision process congestion game with a multi-linear congestion cost. Players within the game complete individual tasks while minimizing their own collision risks. We show that the set of Nash equilibria coincides with the first-order KKT points of a non-convex optimization problem. Our game is applied to a historical flight plan over France to reduce collision risks between commercial aircraft.
△ Less
Submitted 10 December, 2022; v1 submitted 8 December, 2022;
originally announced December 2022.
-
Set-based value operators for non-stationary Markovian environments
Authors:
Sarah H. Q. Li,
Assalé Adjé,
Pierre-Loïc Garoche,
Behçet Açıkmeşe
Abstract:
This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. To this end, we generalize the Bellman and policy evaluation operators to contracting operators on the value function space and denote them as \emph{value operators}. We lift these value operators to act on \emph{sets}…
▽ More
This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. To this end, we generalize the Bellman and policy evaluation operators to contracting operators on the value function space and denote them as \emph{value operators}. We lift these value operators to act on \emph{sets} of value functions and denote them as \emph{set-based value operators}. We prove that the set-based value operators are \emph{contractions} in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition in classic robust MDP literature to a containment condition for all value operators, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contracting operators in dynamic programming. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own extrema elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function. Finally, we derive novel guarantees for probabilistic path-planning problems in planet exploration and stratospheric station-keeping.
△ Less
Submitted 8 August, 2023; v1 submitted 14 July, 2022;
originally announced July 2022.
-
General sum stochastic games with networked information flows
Authors:
Sarah H. Q. Li,
Lillian J. Ratliff,
Peeyush Kumar
Abstract:
Inspired by applications such as supply chain management, epidemics, and social networks, we formulate a stochastic game model that addresses three key features common across these domains: 1) network-structured player interactions, 2) pair-wise mixed cooperation and competition among players, and 3) limited global information toward individual decision-making. In combination, these features pose…
▽ More
Inspired by applications such as supply chain management, epidemics, and social networks, we formulate a stochastic game model that addresses three key features common across these domains: 1) network-structured player interactions, 2) pair-wise mixed cooperation and competition among players, and 3) limited global information toward individual decision-making. In combination, these features pose significant challenges for black box approaches taken by deep learning-based multi-agent reinforcement learning (MARL) algorithms and deserve more detailed analysis. We formulate a networked stochastic game with pair-wise general sum objectives and asymmetrical information structure, and empirically explore the effects of information availability on the outcomes of different MARL paradigms such as individual learning and centralized learning decentralized execution.
△ Less
Submitted 5 May, 2022;
originally announced May 2022.
-
Congestion-aware path coordination game with Markov decision process dynamics
Authors:
Sarah H. Q. Li,
Dan Calderone,
Behcet Acikmese
Abstract:
Inspired by the path coordination problem arising from robo-taxis, warehouse management, and mixed-vehicle routing problems, we model a group of heterogeneous players responding to stochastic demands as a congestion game under Markov decision process dynamics. Players share a common state-action space but have unique transition dynamics, and each player's unique cost is a {function} of the joint s…
▽ More
Inspired by the path coordination problem arising from robo-taxis, warehouse management, and mixed-vehicle routing problems, we model a group of heterogeneous players responding to stochastic demands as a congestion game under Markov decision process dynamics. Players share a common state-action space but have unique transition dynamics, and each player's unique cost is a {function} of the joint state-action probability distribution. For a class of player cost functions, we formulate the player-specific optimization problem, prove the equivalence between the Nash equilibrium and the solution of a potential minimization problem, and derive dynamic programming approaches to solve the Nash equilibrium. We apply this game to model multi-agent path coordination and introduce congestion-based cost functions that enable players to complete individual tasks while avoiding congestion with their opponents. Finally, we present a learning algorithm for finding the Nash equilibrium that has linear complexity in the number of players. We demonstrate our game model on a multi-robot warehouse \change{path coordination problem}, in which robots autonomously retrieve and deliver packages while avoiding congested paths.
△ Less
Submitted 5 July, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Disturbance Decoupling for Gradient-based Multi-Agent Learning with Quadratic Costs
Authors:
Sarah H. Q. Li,
Lillian Ratliff,
Behçet Açıkmeşe
Abstract:
Motivated by applications of multi-agent learning in noisy environments, this paper studies the robustness of gradient-based learning dynamics with respect to disturbances. While disturbances injected along a coordinate corresponding to any individual player's actions can always affect the overall learning dynamics, a subset of players can be disturbance decoupled---i.e., such players' actions are…
▽ More
Motivated by applications of multi-agent learning in noisy environments, this paper studies the robustness of gradient-based learning dynamics with respect to disturbances. While disturbances injected along a coordinate corresponding to any individual player's actions can always affect the overall learning dynamics, a subset of players can be disturbance decoupled---i.e., such players' actions are completely unaffected by the injected disturbance. We provide necessary and sufficient conditions to guarantee this property for games with quadratic cost functions, which encompass quadratic one-shot continuous games, finite-horizon linear quadratic (LQ) dynamic games, and bilinear games. Specifically, disturbance decoupling is characterized by both algebraic and graph-theoretic conditions on the learning dynamics, the latter is obtained by constructing a game graph based on gradients of players' costs. For LQ games, we show that disturbance decoupling imposes constraints on the controllable and unobservable subspaces of players. For two player bilinear games, we show that disturbance decoupling within a player's action coordinates imposes constraints on the payoff matrices. Illustrative numerical examples are provided.
△ Less
Submitted 10 October, 2020; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Bounding Fixed Points of Set-Based Bellman Operator and Nash Equilibria of Stochastic Games
Authors:
Sarah H. Q. Li,
Assalé,
Adjé,
Pierre-Loïc Garoche,
Behçet Açıkmeşe
Abstract:
Motivated by uncertain parameters encountered in Markov decision processes (MDPs) and stochastic games, we study the effect of parameter uncertainty on Bellman operator-based algorithms under a set-based framework. Specifically, we first consider a family of MDPs where the cost parameters are in a given compact set; we then define a Bellman operator acting on a set of value functions to produce a…
▽ More
Motivated by uncertain parameters encountered in Markov decision processes (MDPs) and stochastic games, we study the effect of parameter uncertainty on Bellman operator-based algorithms under a set-based framework. Specifically, we first consider a family of MDPs where the cost parameters are in a given compact set; we then define a Bellman operator acting on a set of value functions to produce a new set of value functions as the output under all possible variations in the cost parameter. We prove the existence of a fixed point of this set-based Bellman operator by showing that it is contractive on a complete metric space, and explore its relationship with the corresponding family of MDPs and stochastic games. Additionally, we show that given interval set bounded cost parameters, we can form exact bounds on the set of optimal value functions. Finally, we utilize our results to bound the value function trajectory of a player in a stochastic game.
△ Less
Submitted 10 October, 2020; v1 submitted 22 January, 2020;
originally announced January 2020.
-
Fixed Points of the Set-Based Bellman Operator
Authors:
Sarah H. Q. Li,
Assalé Adjé,
Pierre-Loïc Garoche,
Behçet Açıkmeşe
Abstract:
Motivated by uncertain parameters encountered in Markov decision processes (MDPs), we study the effect of parameter uncertainty on Bellman operator-based methods. Specifically, we consider a family of MDPs where the cost parameters are from a given compact set. We then define a Bellman operator acting on an input set of value functions to produce a new set of value functions as the output under al…
▽ More
Motivated by uncertain parameters encountered in Markov decision processes (MDPs), we study the effect of parameter uncertainty on Bellman operator-based methods. Specifically, we consider a family of MDPs where the cost parameters are from a given compact set. We then define a Bellman operator acting on an input set of value functions to produce a new set of value functions as the output under all possible variations in the cost parameters. Finally we prove the existence of a fixed point of this set-based Bellman operator by showing that it is a contractive operator on a complete metric space.
△ Less
Submitted 29 February, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Sensitivity Analysis for Markov Decision Process Congestion Games
Authors:
Sarah H. Q. Li,
Daniel Calderone,
Lillian Ratliff,
Behcet Acikmese
Abstract:
We consider a non-atomic congestion game where each decision maker performs selfish optimization over states of a common MDP. The decision makers optimize for their own expected costs, and influence each other through congestion effects on the state-action costs. We analyze on the sensitivity of MDP congestion game equilibria to uncertainty and perturbations in the state-action costs by applying a…
▽ More
We consider a non-atomic congestion game where each decision maker performs selfish optimization over states of a common MDP. The decision makers optimize for their own expected costs, and influence each other through congestion effects on the state-action costs. We analyze on the sensitivity of MDP congestion game equilibria to uncertainty and perturbations in the state-action costs by applying an implicit function type analysis. The occurrence of a stochastic Braess paradox is defined, analyzed based on sensitivity of game equilibria and demonstrated in simulation. We further analyze how the introduction of stochastic dynamics affects the magnitude of Braess paradox in comparison to deterministic dynamics.
△ Less
Submitted 12 September, 2019; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Adaptive Constraint Satisfaction for Markov Decision Process Congestion Games: Application to Transportation Networks
Authors:
Sarah H. Q. Li,
Yue Yu,
Nicolas Miguel,
Dan Calderone,
Lillian J. Ratliff,
Behcet Acikmese
Abstract:
Under the Markov decision process (MDP) congestion game framework, we study the problem of enforcing population distribution constraints on a population of players with stochastic dynamics and coupled congestion costs. Existing research demonstrates that the constraints on the players' population distribution can be satisfied by enforcing tolls. However, computing the minimum toll value for constr…
▽ More
Under the Markov decision process (MDP) congestion game framework, we study the problem of enforcing population distribution constraints on a population of players with stochastic dynamics and coupled congestion costs. Existing research demonstrates that the constraints on the players' population distribution can be satisfied by enforcing tolls. However, computing the minimum toll value for constraint satisfaction requires accurate modeling of the player's congestion costs. Motivated by settings where an accurate congestion cost model is unavailable (e.g. transportation networks), we consider an MDP congestion game with unknown congestion costs. We assume that a constraint-enforcing authority can repeatedly enforce tolls on a population of players who converges to an $ε$-optimal population distribution for any given toll. We then construct a myopic update algorithm to compute the minimum toll value while ensuring that the constraints are satisfied on average. We analyze how the players' sub-optimal responses to tolls impact the rates of convergence towards the minimum toll value and constraint satisfaction. Finally, we construct a congestion game model for Uber drivers in Manhattan, New York City (NYC) using data from the Taxi and Limousine Commission (TLC) to illustrate how to efficiently reduce congestion while minimizing the impact on driver earnings.
△ Less
Submitted 14 August, 2022; v1 submitted 21 July, 2019;
originally announced July 2019.
-
Tolling for Constraint Satisfaction in Markov Decision Process Congestion Games
Authors:
Sarah H. Q. Li,
Yue Yu,
Daniel Calderone,
Lillian Ratliff,
Behcet Acikmese
Abstract:
Markov decision process (MDP) congestion game is an extension of classic congestion games, where a continuous population of selfish agents solves Markov decision processes with congestion: the payoff of a strategy decreases as more population uses it. We draw parallels between key concepts from capacitated congestion games and MDP. In particular, we show that population mass constraints in MDP con…
▽ More
Markov decision process (MDP) congestion game is an extension of classic congestion games, where a continuous population of selfish agents solves Markov decision processes with congestion: the payoff of a strategy decreases as more population uses it. We draw parallels between key concepts from capacitated congestion games and MDP. In particular, we show that population mass constraints in MDP congestion games are equivalent to imposing tolls/incentives on the reward function, which can be utilized by social planners to achieve auxiliary objectives. We demonstrate such methods in a simulated Seattle ride-share model, where tolls and incentives are enforced for two separate objectives: to guarantee minimum driver density in downtown Seattle, and to shift the game equilibrium towards a maximum social output.
△ Less
Submitted 2 March, 2019;
originally announced March 2019.