-
Solving Zero-Sum Convex Markov Games
Authors:
Fivos Kalogiannis,
Emmanouil-Vasileios Vlatakis-Gkaragkounis,
Ian Gemp,
Georgios Piliouras
Abstract:
We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modelin…
▽ More
We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon.
We follow a two-step approach. First, leveraging properties of hidden-convex--hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.
△ Less
Submitted 19 June, 2025;
originally announced June 2025.
-
Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem
Authors:
Fivos Kalogiannis,
Jingming Yan,
Ioannis Panageas
Abstract:
We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, the adversary. These games unify two-player zero-sum Markov games and Markov potential ga…
▽ More
We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, the adversary. These games unify two-player zero-sum Markov games and Markov potential games, resulting in a setting that encompasses both collaboration and competition. Kalogiannis et al. (2023a) provided an efficient equilibrium computation algorithm for ATMGs which presumes knowledge of the reward and transition functions and has no sample complexity guarantees. We contribute a learning algorithm that utilizes MARL policy gradient methods with iteration and sample complexity that is polynomial in the approximation error $ε$ and the natural parameters of the ATMG, resolving the main caveats of the solution by (Kalogiannis et al., 2023a). It is worth noting that previously, the existence of learning algorithms for NE was known for Markov two-player zero-sum and potential games but not for ATMGs.
Seen through the lens of min-max optimization, computing a NE in these games consists a nonconvex-nonconcave saddle-point problem. Min-max optimization has received extensive study. Nevertheless, the case of nonconvex-nonconcave landscapes remains elusive: in full generality, finding saddle-points is computationally intractable (Daskalakis et al., 2021). We circumvent the aforementioned intractability by developing techniques that exploit the hidden structure of the objective function via a nonconvex-concave reformulation. However, this introduces the challenge of a feasibility set with coupled constraints. We tackle these challenges by establishing novel techniques for optimizing weakly-smooth nonconvex functions, extending the framework of (Devolder et al., 2014).
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Computing Nash Equilibria in Potential Games with Private Uncoupled Constraints
Authors:
Nikolas Patris,
Stelios Stavroulakis,
Fivos Kalogiannis,
Rose Zhang,
Ioannis Panageas
Abstract:
We consider the problem of computing Nash equilibria in potential games where each player's strategy set is subject to private uncoupled constraints. This scenario is frequently encountered in real-world applications like road network congestion games where individual drivers adhere to personal budget and fuel limitations. Despite the plethora of algorithms that efficiently compute Nash equilibria…
▽ More
We consider the problem of computing Nash equilibria in potential games where each player's strategy set is subject to private uncoupled constraints. This scenario is frequently encountered in real-world applications like road network congestion games where individual drivers adhere to personal budget and fuel limitations. Despite the plethora of algorithms that efficiently compute Nash equilibria (NE) in potential games, the domain of constrained potential games remains largely unexplored. We introduce an algorithm that leverages the Lagrangian formulation of NE. The algorithm is implemented independently by each player and runs in polynomial time with respect to the approximation error, the sum of the size of the action-spaces, and the game's inherent parameters.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria
Authors:
Fivos Kalogiannis,
Ioannis Panageas
Abstract:
The works of (Daskalakis et al., 2009, 2022; Jin et al., 2022; Deng et al., 2023) indicate that computing Nash equilibria in multi-player Markov games is a computationally hard task. This fact raises the question of whether or not computational intractability can be circumvented if one focuses on specific classes of Markov games. One such example is two-player zero-sum Markov games, in which effic…
▽ More
The works of (Daskalakis et al., 2009, 2022; Jin et al., 2022; Deng et al., 2023) indicate that computing Nash equilibria in multi-player Markov games is a computationally hard task. This fact raises the question of whether or not computational intractability can be circumvented if one focuses on specific classes of Markov games. One such example is two-player zero-sum Markov games, in which efficient ways to compute a Nash equilibrium are known. Inspired by zero-sum polymatrix normal-form games (Cai et al., 2016), we define a class of zero-sum multi-agent Markov games in which there are only pairwise interactions described by a graph that changes per state. For this class of Markov games, we show that an $ε$-approximate Nash equilibrium can be found efficiently. To do so, we generalize the techniques of (Cai et al., 2016), by showing that the set of coarse-correlated equilibria collapses to the set of Nash equilibria. Afterwards, it is possible to use any algorithm in the literature that computes approximate coarse-correlated equilibria Markovian policies to get an approximate Nash equilibrium.
△ Less
Submitted 29 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Algorithms and Complexity for Computing Nash Equilibria in Adversarial Team Games
Authors:
Ioannis Anagnostides,
Fivos Kalogiannis,
Ioannis Panageas,
Emmanouil-Vasileios Vlatakis-Gkaragkounis,
Stephen McAleer
Abstract:
Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by…
▽ More
Adversarial team games model multiplayer strategic interactions in which a team of identically-interested players is competing against an adversarial player in a zero-sum game. Such games capture many well-studied settings in game theory, such as congestion games, but go well-beyond to environments wherein the cooperation of one team -- in the absence of explicit communication -- is obstructed by competing entities; the latter setting remains poorly understood despite its numerous applications. Since the seminal work of Von Stengel and Koller (GEB `97), different solution concepts have received attention from an algorithmic standpoint. Yet, the complexity of the standard Nash equilibrium has remained open.
In this paper, we settle this question by showing that computing a Nash equilibrium in adversarial team games belongs to the class continuous local search (CLS), thereby establishing CLS-completeness by virtue of the recent CLS-hardness result of Rubinstein and Babichenko (STOC `21) in potential games. To do so, we leverage linear programming duality to prove that any $ε$-approximate stationary strategy for the team can be extended in polynomial time to an $O(ε)$-approximate Nash equilibrium, where the $O(\cdot)$ notation suppresses polynomial factors in the description of the game. As a consequence, we show that the Moreau envelop of a suitable best response function acts as a potential under certain natural gradient-based dynamics.
△ Less
Submitted 30 May, 2023; v1 submitted 5 January, 2023;
originally announced January 2023.
-
Efficiently Computing Nash Equilibria in Adversarial Team Markov Games
Authors:
Fivos Kalogiannis,
Ioannis Anagnostides,
Ioannis Panageas,
Emmanouil-Vasileios Vlatakis-Gkaragkounis,
Vaggos Chatziafratis,
Stelios Stavroulakis
Abstract:
Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those pri…
▽ More
Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $ε$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/ε$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Towards convergence to Nash equilibria in two-team zero-sum games
Authors:
Fivos Kalogiannis,
Ioannis Panageas,
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Abstract:
Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identi…
▽ More
Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identical to that of their teammates and opposite to that of the opposing team. We focus on the solution concept of Nash equilibria (NE). We first show that computing NE for this class of games is $\textit{hard}$ for the complexity class ${\mathrm{CLS}}$. To further examine the capabilities of online learning algorithms in games with full-information feedback, we propose a benchmark of a simple -- yet nontrivial -- family of such games. These games do not enjoy the properties used to prove convergence for relevant algorithms. In particular, we use a dynamical systems perspective to demonstrate that gradient descent-ascent, its optimistic variant, optimistic multiplicative weights update, and extra gradient fail to converge (even locally) to a Nash equilibrium. On a brighter note, we propose a first-order method that leverages control theory techniques and under some conditions enjoys last-iterate local convergence to a Nash equilibrium. We also believe our proposed method is of independent interest for general min-max optimization.
△ Less
Submitted 16 April, 2023; v1 submitted 7 November, 2021;
originally announced November 2021.