-
Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games
Authors:
Ashkan Soleymani,
Georgios Piliouras,
Gabriele Farina
Abstract:
Recent work [Soleymani et al., 2025] introduced a variant of Optimistic Multiplicative Weights Updates (OMWU) that adaptively controls the learning pace in a dynamic, non-monotone manner, achieving new state-of-the-art regret minimization guarantees in general games. In this work, we demonstrate that no-regret learning acceleration through adaptive pacing of the learners is not an isolated phenome…
▽ More
Recent work [Soleymani et al., 2025] introduced a variant of Optimistic Multiplicative Weights Updates (OMWU) that adaptively controls the learning pace in a dynamic, non-monotone manner, achieving new state-of-the-art regret minimization guarantees in general games. In this work, we demonstrate that no-regret learning acceleration through adaptive pacing of the learners is not an isolated phenomenon. We introduce \emph{Cautious Optimism}, a framework for substantially faster regularized learning in general games. Cautious Optimism takes as input any instance of Follow-the-Regularized-Leader (FTRL) and outputs an accelerated no-regret learning algorithm by pacing the underlying FTRL with minimal computational overhead. Importantly, we retain uncoupledness (learners do not need to know other players' utilities). Cautious Optimistic FTRL achieves near-optimal $O_T(\log T)$ regret in diverse self-play (mixing-and-matching regularizers) while preserving the optimal $O(\sqrt{T})$ regret in adversarial scenarios. In contrast to prior works (e.g. Syrgkanis et al. [2015], Daskalakis et al. [2021]), our analysis does not rely on monotonic step-sizes, showcasing a novel route for fast learning in general games.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Plasticity as the Mirror of Empowerment
Authors:
David Abel,
Michael Bowling,
André Barreto,
Will Dabney,
Shi Dong,
Steven Hansen,
Anna Harutyunyan,
Khimya Khetarpal,
Clare Lyle,
Razvan Pascanu,
Georgios Piliouras,
Doina Precup,
Jonathan Richens,
Mark Rowland,
Tom Schaul,
Satinder Singh
Abstract:
Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observ…
▽ More
Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Our first finding is that plasticity is the mirror of empowerment: The agent's plasticity is identical to the empowerment of the environment, and vice versa. Our second finding establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Faster Rates for No-Regret Learning in General Games via Cautious Optimism
Authors:
Ashkan Soleymani,
Georgios Piliouras,
Gabriele Farina
Abstract:
We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Reg…
▽ More
We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative.
△ Less
Submitted 31 March, 2025;
originally announced March 2025.
-
Passivity, No-Regret, and Convergent Learning in Contractive Games
Authors:
Hassan Abdelraouf,
Georgios Piliouras,
Jeff S. Shamma
Abstract:
We investigate the interplay between passivity, no-regret, and convergence in contractive games for various learning dynamic models and their higher-order variants. Our setting is continuous time. Building on prior work for replicator dynamics, we show that if learning dynamics satisfy a passivity condition between the payoff vector and the difference between its evolving strategy and any fixed st…
▽ More
We investigate the interplay between passivity, no-regret, and convergence in contractive games for various learning dynamic models and their higher-order variants. Our setting is continuous time. Building on prior work for replicator dynamics, we show that if learning dynamics satisfy a passivity condition between the payoff vector and the difference between its evolving strategy and any fixed strategy, then it achieves finite regret. We then establish that the passivity condition holds for various learning dynamics and their higher-order variants. Consequentially, the higher-order variants can achieve convergence to Nash equilibrium in cases where their standard order counterparts cannot, while maintaining a finite regret property. We provide numerical examples to illustrate the lack of finite regret of different evolutionary dynamic models that violate the passivity property. We also examine the fragility of the finite regret property in the case of perturbed learning dynamics. Continuing with passivity, we establish another connection between finite regret and passivity, but with the related equilibrium-independent passivity property. Finally, we present a passivity-based classification of dynamic models according to the various passivity notions they satisfy, namely, incremental passivity, $δ$-passivity, and equilibrium-independent passivity. This passivity-based classification provides a framework to analyze the convergence of learning dynamic models in contractive games.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Re-evaluating Open-ended Evaluation of Large Language Models
Authors:
Siqi Liu,
Ian Gemp,
Luke Marris,
Georgios Piliouras,
Nicolas Heess,
Marc Lanctot
Abstract:
Evaluation has traditionally focused on ranking candidates for a specific skill. Modern generalist models, such as Large Language Models (LLMs), decidedly outpace this paradigm. Open-ended evaluation systems, where candidate models are compared on user-submitted prompts, have emerged as a popular solution. Despite their many advantages, we show that the current Elo-based rating systems can be susc…
▽ More
Evaluation has traditionally focused on ranking candidates for a specific skill. Modern generalist models, such as Large Language Models (LLMs), decidedly outpace this paradigm. Open-ended evaluation systems, where candidate models are compared on user-submitted prompts, have emerged as a popular solution. Despite their many advantages, we show that the current Elo-based rating systems can be susceptible to and even reinforce biases in data, intentional or accidental, due to their sensitivity to redundancies. To address this issue, we propose evaluation as a 3-player game, and introduce novel game-theoretic solution concepts to ensure robustness to redundancy. We show that our method leads to intuitive ratings and provide insights into the competitive landscape of LLM development.
△ Less
Submitted 8 May, 2025; v1 submitted 27 February, 2025;
originally announced February 2025.
-
Multi-Agent Risks from Advanced AI
Authors:
Lewis Hammond,
Alan Chan,
Jesse Clifton,
Jason Hoelscher-Obermaier,
Akbir Khan,
Euan McLean,
Chandler Smith,
Wolfram Barfuss,
Jakob Foerster,
Tomáš Gavenčiak,
The Anh Han,
Edward Hughes,
Vojtěch Kovařík,
Jan Kulveit,
Joel Z. Leibo,
Caspar Oesterheld,
Christian Schroeder de Witt,
Nisarg Shah,
Michael Wellman,
Paolo Bova,
Theodor Cimpeanu,
Carson Ezell,
Quentin Feuillade-Montixi,
Matija Franklin,
Esben Kran
, et al. (19 additional authors not shown)
Abstract:
The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, a…
▽ More
The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Deviation Ratings: A General, Clone-Invariant Rating Method
Authors:
Luke Marris,
Siqi Liu,
Ian Gemp,
Georgios Piliouras,
Marc Lanctot
Abstract:
Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These strategic interactions may be agentic (e.g. players trying to win), fundamental (e.g. cost vs quality), or complementary (e.g. niche finding and specialization). In such a formulation, it is the stra…
▽ More
Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These strategic interactions may be agentic (e.g. players trying to win), fundamental (e.g. cost vs quality), or complementary (e.g. niche finding and specialization). In such a formulation, it is the strategies (actions, policies, agents, models, tasks, prompts, etc.) that are rated. However, the rating problem is complicated by redundancy and complexity of N-player strategic interactions. Repeated or similar strategies can distort ratings for those that counter or complement them. Previous work proposed ``clone invariant'' ratings to handle such redundancies, but this was limited to two-player zero-sum (i.e. strictly competitive) interactions. This work introduces the first N-player general-sum clone invariant rating, called deviation ratings, based on coarse correlated equilibria. The rating is explored on several domains including LLMs evaluation.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Agency Is Frame-Dependent
Authors:
David Abel,
André Barreto,
Michael Bowling,
Will Dabney,
Shi Dong,
Steven Hansen,
Anna Harutyunyan,
Khimya Khetarpal,
Clare Lyle,
Razvan Pascanu,
Georgios Piliouras,
Doina Precup,
Jonathan Richens,
Mark Rowland,
Tom Schaul,
Satinder Singh
Abstract:
Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess age…
▽ More
Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.
△ Less
Submitted 6 February, 2025;
originally announced February 2025.
-
No-regret learning in harmonic games: Extrapolation in the face of conflicting interests
Authors:
Davide Legacci,
Panayotis Mertikopoulos,
Christos H. Papadimitriou,
Georgios Piliouras,
Bary S. R. Pradelski
Abstract:
The long-run behavior of multi-agent learning - and, in particular, no-regret learning - is relatively well-understood in potential games, where players have aligned interests. By contrast, in harmonic games - the strategic counterpart of potential games, where players have conflicting interests - very little is known outside the narrow subclass of 2-player zero-sum games with a fully-mixed equili…
▽ More
The long-run behavior of multi-agent learning - and, in particular, no-regret learning - is relatively well-understood in potential games, where players have aligned interests. By contrast, in harmonic games - the strategic counterpart of potential games, where players have conflicting interests - very little is known outside the narrow subclass of 2-player zero-sum games with a fully-mixed equilibrium. Our paper seeks to partially fill this gap by focusing on the full class of (generalized) harmonic games and examining the convergence properties of follow-the-regularized-leader (FTRL), the most widely studied class of no-regret learning schemes. As a first result, we show that the continuous-time dynamics of FTRL are Poincaré recurrent, that is, they return arbitrarily close to their starting point infinitely often, and hence fail to converge. In discrete time, the standard, "vanilla" implementation of FTRL may lead to even worse outcomes, eventually trapping the players in a perpetual cycle of best-responses. However, if FTRL is augmented with a suitable extrapolation step - which includes as special cases the optimistic and mirror-prox variants of FTRL - we show that learning converges to a Nash equilibrium from any initial condition, and all players are guaranteed at most O(1) regret. These results provide an in-depth understanding of no-regret learning in harmonic games, nesting prior work on 2-player zero-sum games, and showing at a high level that harmonic games are the canonical complement of potential games, not only from a strategic, but also from a dynamic viewpoint.
△ Less
Submitted 28 December, 2024;
originally announced December 2024.
-
A theory of appropriateness with applications to generative artificial intelligence
Authors:
Joel Z. Leibo,
Alexander Sasha Vezhnevets,
Manfred Diaz,
John P. Agapiou,
William A. Cunningham,
Peter Sunehag,
Julia Haas,
Raphael Koster,
Edgar A. Duéñez-Guzmán,
William S. Isaac,
Georgios Piliouras,
Stanley M. Bileschi,
Iyad Rahwan,
Simon Osindero
Abstract:
What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which action…
▽ More
What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which actions are appropriate in which contexts? And what causes these standards to change over time? Since all judgments of AI appropriateness are ultimately made by humans, we need to understand how appropriateness guides human decision making in order to properly evaluate AI decision making and improve it. This paper presents a theory of appropriateness: how it functions in human society, how it may be implemented in the brain, and what it means for responsible deployment of generative AI technology.
△ Less
Submitted 25 December, 2024;
originally announced December 2024.
-
Charting the Shapes of Stories with Game Theory
Authors:
Constantinos Daskalakis,
Ian Gemp,
Yanchen Jiang,
Renato Paes Leme,
Christos Papadimitriou,
Georgios Piliouras
Abstract:
Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories and insights from structure. Historically, these tools have been restricted to one dimensional charts and dynamic social networks; however, modern AI offers the possibility of iden…
▽ More
Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories and insights from structure. Historically, these tools have been restricted to one dimensional charts and dynamic social networks; however, modern AI offers the possibility of identifying more fully the plot structure, character incentives, and, importantly, counterfactual plot lines that the story could have taken but did not take. In this work, we use AI to model the structure of stories as game-theoretic objects, amenable to quantitative analysis. This allows us to not only interrogate each character's decision making, but also possibly peer into the original author's conception of the characters' world. We demonstrate our proposed technique on Shakespeare's famous Romeo and Juliet. We conclude with a discussion of how our analysis could be replicated in broader contexts, including real-life scenarios.
△ Less
Submitted 7 December, 2024;
originally announced December 2024.
-
Interval maps mimicking circle rotations
Authors:
Jakub Bielawski,
Thiparat Chotibut,
Fryderyk Falniowski,
Michał Misiurewicz,
Georgios Piliouras
Abstract:
We investigate the dynamics of maps of the real line whose behavior on an invariant interval is close to a rational rotation on the circle. We concentrate on a specific two-parameter family, describing the dynamics arising from models in game theory, mathematical biology and machine learning. If one parameter is a rational number, $k/n$, with $k,n$ coprime, and the second one is large enough, we p…
▽ More
We investigate the dynamics of maps of the real line whose behavior on an invariant interval is close to a rational rotation on the circle. We concentrate on a specific two-parameter family, describing the dynamics arising from models in game theory, mathematical biology and machine learning. If one parameter is a rational number, $k/n$, with $k,n$ coprime, and the second one is large enough, we prove that there is a periodic orbit of period $n$. It behaves like an orbit of the circle rotation by an angle $2πk/n$ and attracts trajectories of Lebesgue almost all starting points. We also discover numerically other interesting phenomena. While we do not give rigorous proofs for them, we provide convincing explanations.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Convex Markov Games: A Framework for Creativity, Imitation, Fairness, and Safety in Multiagent Learning
Authors:
Ian Gemp,
Andreas Haupt,
Luke Marris,
Siqi Liu,
Georgios Piliouras
Abstract:
Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibri…
▽ More
Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist. Furthermore, equilibria can be approximated empirically by performing gradient descent on an upper bound of exploitability. Our experiments reveal novel solutions to classic repeated normal-form games, find fair solutions in a repeated asymmetric coordination game, and prioritize safe long-term behavior in a robot warehouse environment. In the prisoner's dilemma, our algorithm leverages transient imitation to find a policy profile that deviates from observed human play only slightly, yet achieves higher per-player utility while also being three orders of magnitude less exploitable.
△ Less
Submitted 16 January, 2025; v1 submitted 21 October, 2024;
originally announced October 2024.
-
Swim till You Sink: Computing the Limit of a Game
Authors:
Rashida Hakim,
Jason Milionis,
Christos Papadimitriou,
Georgios Piliouras
Abstract:
During 2023, two interesting results were proven about the limit behavior of game dynamics: First, it was shown that there is a game for which no dynamics converges to the Nash equilibria. Second, it was shown that the sink equilibria of a game adequately capture the limit behavior of natural game dynamics. These two results have created a need and opportunity to articulate a principled computatio…
▽ More
During 2023, two interesting results were proven about the limit behavior of game dynamics: First, it was shown that there is a game for which no dynamics converges to the Nash equilibria. Second, it was shown that the sink equilibria of a game adequately capture the limit behavior of natural game dynamics. These two results have created a need and opportunity to articulate a principled computational theory of the meaning of the game that is based on game dynamics. Given any game in normal form, and any prior distribution of play, we study the problem of computing the asymptotic behavior of a class of natural dynamics called the noisy replicator dynamics as a limit distribution over the sink equilibria of the game. When the prior distribution has pure strategy support, we prove this distribution can be computed efficiently, in near-linear time to the size of the best-response graph. When the distribution can be sampled -- for example, if it is the uniform distribution over all mixed strategy profiles -- we show through experiments that the limit distribution of reasonably large games can be estimated quite accurately through sampling and simulation.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Auto-bidding and Auctions in Online Advertising: A Survey
Authors:
Gagan Aggarwal,
Ashwinkumar Badanidiyuru,
Santiago R. Balseiro,
Kshipra Bhawalkar,
Yuan Deng,
Zhe Feng,
Gagan Goel,
Christopher Liaw,
Haihao Lu,
Mohammad Mahdian,
Jieming Mao,
Aranyak Mehta,
Vahab Mirrokni,
Renato Paes Leme,
Andres Perlroth,
Georgios Piliouras,
Jon Schneider,
Ariel Schvartzman,
Balasubramanian Sivan,
Kelly Spendlove,
Yifeng Teng,
Di Wang,
Hanrui Zhang,
Mingfei Zhao,
Wennan Zhu
, et al. (1 additional authors not shown)
Abstract:
In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction d…
▽ More
In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction design.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Complex Dynamics in Autobidding Systems
Authors:
Renato Paes Leme,
Georgios Piliouras,
Jon Schneider,
Kelly Spendlove,
Song Zuo
Abstract:
It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple system…
▽ More
It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple systems would typically converge to the equilibria of their underlying auctions, we provide a plethora of results that show the emergence of complex behavior, such as bi-stability, periodic orbits and quasi periodicity. We empirically observe how the market structure (expressed as motifs) qualitatively affects the behavior of the dynamics. We complement it with theoretical results showing that autobidding systems can simulate both linear dynamical systems as well logical boolean gates.
△ Less
Submitted 1 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg
Authors:
Yi Feng,
Georgios Piliouras,
Xiao Wang
Abstract:
We investigate the accuracy of prediction in deterministic learning dynamics of zero-sum games with random initializations, specifically focusing on observer uncertainty and its relationship to the evolution of covariances. Zero-sum games are a prominent field of interest in machine learning due to their various applications. Concurrently, the accuracy of prediction in dynamical systems from mecha…
▽ More
We investigate the accuracy of prediction in deterministic learning dynamics of zero-sum games with random initializations, specifically focusing on observer uncertainty and its relationship to the evolution of covariances. Zero-sum games are a prominent field of interest in machine learning due to their various applications. Concurrently, the accuracy of prediction in dynamical systems from mechanics has long been a classic subject of investigation since the discovery of the Heisenberg Uncertainty Principle. This principle employs covariance and standard deviation of particle states to measure prediction accuracy. In this study, we bring these two approaches together to analyze the Follow-the-Regularized-Leader (FTRL) algorithm in two-player zero-sum games. We provide growth rates of covariance information for continuous-time FTRL, as well as its two canonical discretization methods (Euler and Symplectic). A Heisenberg-type inequality is established for FTRL. Our analysis and experiments also show that employing Symplectic discretization enhances the accuracy of prediction in learning dynamics.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Learning and steering game dynamics towards desirable outcomes
Authors:
Ilayda Canyakmaz,
Iosif Sakos,
Wayne Lin,
Antonios Varvitsiotis,
Georgios Piliouras
Abstract:
Game dynamics, which describe how agents' strategies evolve over time based on past interactions, can exhibit a variety of undesirable behaviours including convergence to suboptimal equilibria, cycling, and chaos. While central planners can employ incentives to mitigate such behaviors and steer game dynamics towards desirable outcomes, the effectiveness of such interventions critically relies on a…
▽ More
Game dynamics, which describe how agents' strategies evolve over time based on past interactions, can exhibit a variety of undesirable behaviours including convergence to suboptimal equilibria, cycling, and chaos. While central planners can employ incentives to mitigate such behaviors and steer game dynamics towards desirable outcomes, the effectiveness of such interventions critically relies on accurately predicting agents' responses to these incentives -- a task made particularly challenging when the underlying dynamics are unknown and observations are limited. To address this challenge, this work introduces the Side Information Assisted Regression with Model Predictive Control (SIAR-MPC) framework. We extend the recently introduced SIAR method to incorporate the effect of control, enabling it to utilize side-information constraints inherent to game-theoretic applications to model agents' responses to incentives from scarce data. MPC then leverages this model to implement dynamic incentive adjustments. Our experiments demonstrate the effectiveness of SIAR-MPC in guiding systems towards socially optimal equilibria, stabilizing chaotic and cycling behaviors. Notably, it achieves these results in data-scarce settings of few learning samples, where well-known system identification methods paired with MPC show less effective results.
△ Less
Submitted 10 December, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
On the Stability of Learning in Network Games with Many Players
Authors:
Aamal Hussain,
Dan Leonte,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players.
To make progress towards addressing this chall…
▽ More
Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players.
To make progress towards addressing this challenge we study the Q-Learning Dynamics, a classical model for exploration and exploitation in multi-agent learning. In particular, we study the behaviour of Q-Learning on games where interactions between agents are constrained by a network. We determine a number of sufficient conditions, depending on the game and network structure, which guarantee that agent strategies converge to a unique stable strategy, called the Quantal Response Equilibrium (QRE). Crucially, these sufficient conditions are independent of the total number of agents, allowing for provable convergence in arbitrarily large games.
Next, we compare the learned QRE to the underlying NE of the game, by showing that any QRE is an $ε$-approximate Nash Equilibrium. We first provide tight bounds on $ε$ and show how these bounds lead naturally to a centralised scheme for choosing exploration rates, which enables independent learners to learn stable approximate Nash Equilibrium strategies. We validate the method through experiments and demonstrate its effectiveness even in the presence of numerous agents and actions. Through these results, we show that independent learning dynamics may converge to approximate Nash Equilibria, even in the presence of many agents.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
Visualizing 2x2 Normal-Form Games: twoxtwogame LaTeX Package
Authors:
Luke Marris,
Ian Gemp,
Siqi Liu,
Joel Z. Leibo,
Georgios Piliouras
Abstract:
Normal-form games with two players, each with two strategies, are the most studied class of games. These so-called 2x2 games are used to model a variety of strategic interactions. They appear in game theory, economics, and artificial intelligence research. However, there lacks tools for describing and visualizing such games. This work introduces a LaTeX package for visualizing 2x2 games. This work…
▽ More
Normal-form games with two players, each with two strategies, are the most studied class of games. These so-called 2x2 games are used to model a variety of strategic interactions. They appear in game theory, economics, and artificial intelligence research. However, there lacks tools for describing and visualizing such games. This work introduces a LaTeX package for visualizing 2x2 games. This work has two goals: first, to provide high-quality tools and vector graphic visualizations, suitable for scientific publications. And second, to help promote standardization of names and representations of 2x2 games. The LaTeX package, twoxtwogame, is maintained on GitHub and mirrored on CTAN, and is available under a permissive Apache 2 license.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
MEV Sharing with Dynamic Extraction Rates
Authors:
Pedro Braga,
Georgios Chionas,
Piotr Krysta,
Stefanos Leonardos,
Georgios Piliouras,
Carmine Ventre
Abstract:
Maximal Extractable Value (MEV) has emerged as a new frontier in the design of blockchain systems. In this paper, we propose making the MEV extraction rate as part of the protocol design space. Our aim is to leverage this parameter to maintain a healthy balance between block producers (who need to be compensated) and users (who need to feel encouraged to transact). We follow the approach introduce…
▽ More
Maximal Extractable Value (MEV) has emerged as a new frontier in the design of blockchain systems. In this paper, we propose making the MEV extraction rate as part of the protocol design space. Our aim is to leverage this parameter to maintain a healthy balance between block producers (who need to be compensated) and users (who need to feel encouraged to transact). We follow the approach introduced by EIP-1559 and design a similar mechanism to dynamically update the MEV extraction rate with the goal of stabilizing it at a target value. We study the properties of this dynamic mechanism and show that, while convergence to the target can be guaranteed for certain parameters, instability, and even chaos, can occur in other cases. Despite these complexities, under general conditions, the system concentrates in a neighborhood of the target equilibrium implying high long-term performance. Our work establishes, the first to our knowledge, dynamic framework for the integral problem of MEV sharing between extractors and users.
△ Less
Submitted 30 September, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
NfgTransformer: Equivariant Representation Learning for Normal-form Games
Authors:
Siqi Liu,
Luke Marris,
Georgios Piliouras,
Ian Gemp,
Nicolas Heess
Abstract:
Normal-form games (NFGs) are the fundamental model of strategic interaction. We study their representation using neural networks. We describe the inherent equivariance of NFGs -- any permutation of strategies describes an equivalent game -- as well as the challenges this poses for representation learning. We then propose the NfgTransformer architecture that leverages this equivariance, leading to…
▽ More
Normal-form games (NFGs) are the fundamental model of strategic interaction. We study their representation using neural networks. We describe the inherent equivariance of NFGs -- any permutation of strategies describes an equivalent game -- as well as the challenges this poses for representation learning. We then propose the NfgTransformer architecture that leverages this equivariance, leading to state-of-the-art performance in a range of game-theoretic tasks including equilibrium-solving, deviation gain estimation and ranking, with a common approach to NFG representation. We show that the resulting model is interpretable and versatile, paving the way towards deep learning systems capable of game-theoretic reasoning when interacting with humans and with each other.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Approximating the Core via Iterative Coalition Sampling
Authors:
Ian Gemp,
Marc Lanctot,
Luke Marris,
Yiran Mao,
Edgar Duéñez-Guzmán,
Sarah Perrin,
Andras Gyorgy,
Romuald Elie,
Georgios Piliouras,
Michael Kaisers,
Daniel Hennes,
Kalesha Bullard,
Kate Larson,
Yoram Bachrach
Abstract:
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a…
▽ More
The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Steering Language Models with Game-Theoretic Solvers
Authors:
Ian Gemp,
Roma Patel,
Yoram Bachrach,
Marc Lanctot,
Vibhavari Dasagi,
Luke Marris,
Georgios Piliouras,
Siqi Liu,
Karl Tuyls
Abstract:
Mathematical models of interactions among rational agents have long been studied in game theory. However these interactions are often over a small set of discrete game actions which is very different from how humans communicate in natural language. To bridge this gap, we introduce a framework that allows equilibrium solvers to work over the space of natural language dialogue generated by large lan…
▽ More
Mathematical models of interactions among rational agents have long been studied in game theory. However these interactions are often over a small set of discrete game actions which is very different from how humans communicate in natural language. To bridge this gap, we introduce a framework that allows equilibrium solvers to work over the space of natural language dialogue generated by large language models (LLMs). Specifically, by modelling the players, strategies and payoffs in a "game" of dialogue, we create a binding from natural language interactions to the conventional symbolic logic of game theory. Given this binding, we can ask existing game-theoretic algorithms to provide us with strategic solutions (e.g., what string an LLM should generate to maximize payoff in the face of strategic partners or opponents), giving us predictors of stable, rational conversational strategies. We focus on three domains that require different negotiation strategies: scheduling meetings, trading fruit and debate, and evaluate an LLM's generated language when guided by solvers. We see that LLMs that follow game-theory solvers result in dialogue generations that are less exploitable than the control (no guidance from solvers), and the language generated results in higher rewards, in all negotiation domains. We discuss future implications of this work, and how game-theoretic solvers that can leverage the expressivity of natural language can open up a new avenue of guiding language research.
△ Less
Submitted 16 December, 2024; v1 submitted 24 January, 2024;
originally announced February 2024.
-
Neural Population Learning beyond Symmetric Zero-sum Games
Authors:
Siqi Liu,
Luke Marris,
Marc Lanctot,
Georgios Piliouras,
Joel Z. Leibo,
Nicolas Heess
Abstract:
We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Corre…
▽ More
We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Exploiting hidden structures in non-convex games for convergence to Nash equilibrium
Authors:
Iosif Sakos,
Emmanouil-Vasileios Vlatakis-Gkaragkounis,
Panayotis Mertikopoulos,
Georgios Piliouras
Abstract:
A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence…
▽ More
A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such "hidden structures" and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Scalable AI Safety via Doubly-Efficient Debate
Authors:
Jonah Brown-Cohen,
Geoffrey Irving,
Georgios Piliouras
Abstract:
The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of iden…
▽ More
The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games
Authors:
Francisca Vasconcelos,
Emmanouil-Vasileios Vlatakis-Gkaragkounis,
Panayotis Mertikopoulos,
Georgios Piliouras,
Michael I. Jordan
Abstract:
Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed th…
▽ More
Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/ε^2)$ iterations to $ε$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/ε)$ iterations to $ε$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $ε$-Nash equilibria in quantum zero-sum games.
△ Less
Submitted 2 May, 2025; v1 submitted 17 November, 2023;
originally announced November 2023.
-
No-Regret Learning and Equilibrium Computation in Quantum Games
Authors:
Wayne Lin,
Georgios Piliouras,
Ryann Sim,
Antonios Varvitsiotis
Abstract:
As quantum processors advance, the emergence of large-scale decentralized systems involving interacting quantum-enabled agents is on the horizon. Recent research efforts have explored quantum versions of Nash and correlated equilibria as solution concepts of strategic quantum interactions, but these approaches did not directly connect to decentralized adaptive setups where agents possess limited i…
▽ More
As quantum processors advance, the emergence of large-scale decentralized systems involving interacting quantum-enabled agents is on the horizon. Recent research efforts have explored quantum versions of Nash and correlated equilibria as solution concepts of strategic quantum interactions, but these approaches did not directly connect to decentralized adaptive setups where agents possess limited information. This paper delves into the dynamics of quantum-enabled agents within decentralized systems that employ no-regret algorithms to update their behaviors over time. Specifically, we investigate two-player quantum zero-sum games and polymatrix quantum zero-sum games, showing that no-regret algorithms converge to separable quantum Nash equilibria in time-average. In the case of general multi-player quantum games, our work leads to a novel solution concept, that of the {separable} quantum coarse correlated equilibria (QCCE), as the convergent outcome of the time-averaged behavior no-regret algorithms, offering a natural solution concept for decentralized quantum systems. Finally, we show that computing QCCEs can be formulated as a semidefinite program and establish the existence of entangled (i.e., non-separable) QCCEs, which are unlearnable via the current paradigm of no-regret learning.
△ Less
Submitted 1 December, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization
Authors:
Ian Gemp,
Luke Marris,
Georgios Piliouras
Abstract:
We propose the first loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that…
▽ More
We propose the first loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that stochastic gradient descent can outperform previous state-of-the-art approaches.
△ Less
Submitted 15 April, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics
Authors:
Aamal Hussain,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum…
▽ More
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption.
Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Stability of Multi-Agent Learning: Convergence in Network Games with Many Players
Authors:
Aamal Hussain,
Dan Leonte,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the…
▽ More
The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Data-Scarce Identification of Game Dynamics via Sum-of-Squares Optimization
Authors:
Iosif Sakos,
Antonios Varvitsiotis,
Georgios Piliouras
Abstract:
Understanding how players adjust their strategies in games, based on their experience, is a crucial tool for policymakers. It enables them to forecast the system's eventual behavior, exert control over the system, and evaluate counterfactual scenarios. The task becomes increasingly difficult when only a limited number of observations are available or difficult to acquire. In this work, we introduc…
▽ More
Understanding how players adjust their strategies in games, based on their experience, is a crucial tool for policymakers. It enables them to forecast the system's eventual behavior, exert control over the system, and evaluate counterfactual scenarios. The task becomes increasingly difficult when only a limited number of observations are available or difficult to acquire. In this work, we introduce the Side-Information Assisted Regression (SIAR) framework, designed to identify game dynamics in multiplayer normal-form games only using data from a short run of a single system trajectory. To enhance system recovery in the face of scarce data, we integrate side-information constraints into SIAR, which restrict the set of feasible solutions to those satisfying game-theoretic properties and common assumptions about strategic interactions. SIAR is solved using sum-of-squares (SOS) optimization, resulting in a hierarchy of approximations that provably converge to the true dynamics of the system. We showcase that the SIAR framework accurately predicts player behavior across a spectrum of normal-form games, widely-known families of game dynamics, and strong benchmarks, even if the unknown system is chaotic.
△ Less
Submitted 11 October, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Multiplicative Updates for Online Convex Optimization over Symmetric Cones
Authors:
Ilayda Canyakmaz,
Wayne Lin,
Georgios Piliouras,
Antonios Varvitsiotis
Abstract:
We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan A…
▽ More
We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Chaos persists in large-scale multi-agent learning despite adaptive learning rates
Authors:
Emmanouil-Vasileios Vlatakis-Gkaragkounis,
Lampros Flokas,
Georgios Piliouras
Abstract:
Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved…
▽ More
Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved convergence guarantees in small games, it has been much harder to analyze them in more relevant settings with large populations of agents. These settings are particularly hard as recent work has established that learning with fixed rates will become chaotic given large enough populations.In this work, we show that chaos persists in large population congestion games despite using adaptive learning rates even for the ubiquitous Multiplicative Weight Updates algorithm, even in the presence of only two strategies. At a technical level, due to the non-autonomous nature of the system, our approach goes beyond conventional period-three techniques Li-Yorke by studying fundamental properties of the dynamics including invariant sets, volume expansion and turbulent sets. We complement our theoretical insights with experiments showcasing that slight variations to system parameters lead to a wide variety of unpredictable behaviors.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games
Authors:
Luke Marris,
Ian Gemp,
Georgios Piliouras
Abstract:
Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equil…
▽ More
Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
A stochastic variant of replicator dynamics in zero-sum games and its invariant measures
Authors:
Maximilian Engel,
Georgios Piliouras
Abstract:
We study the behavior of a stochastic variant of replicator dynamics in two-agent zero-sum games. We characterize the statistics of such systems by their invariant measures which can be shown to be entirely supported on the boundary of the space of mixed strategies. Depending on the noise strength we can furthermore characterize these invariant measures by finding accumulation of mass at specific…
▽ More
We study the behavior of a stochastic variant of replicator dynamics in two-agent zero-sum games. We characterize the statistics of such systems by their invariant measures which can be shown to be entirely supported on the boundary of the space of mixed strategies. Depending on the noise strength we can furthermore characterize these invariant measures by finding accumulation of mass at specific parts of the boundary. In particular, regardless of the magnitude of noise, we show that any invariant probability measure is a convex combination of Dirac measures on pure strategy profiles, which correspond to vertices/corners of the agents' simplices. Thus, in the presence of stochastic perturbations, even in the most classic zero-sum settings, such as Matching Pennies, we observe a stark disagreement between the axiomatic prediction of Nash equilibrium and the evolutionary emergent behavior derived by an assumption of stochastically adaptive, learning agents.
△ Less
Submitted 25 October, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Generative Adversarial Equilibrium Solvers
Authors:
Denizalp Goktas,
David C. Parkes,
Ian Gemp,
Luke Marris,
Georgios Piliouras,
Romuald Elie,
Guy Lever,
Andrea Tacchetti
Abstract:
We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other playe…
▽ More
We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other players but also their feasible action spaces. Although the computation of GNE and CE is intractable in the worst-case, i.e., PPAD-hard, in practice, many applications only require solutions with high accuracy in expectation over a distribution of problem instances. We introduce Generative Adversarial Equilibrium Solvers (GAES): a family of generative adversarial neural networks that can learn GNE and CE from only a sample of problem instances. We provide computational and sample complexity bounds, and apply the framework to finding Nash equilibria in normal-form games, CE in Arrow-Debreu competitive economies, and GNE in an environmental economic model of the Kyoto mechanism.
△ Less
Submitted 20 February, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Learning in Quantum Common-Interest Games and the Separability Problem
Authors:
Wayne Lin,
Georgios Piliouras,
Ryann Sim,
Antonios Varvitsiotis
Abstract:
Learning in games has emerged as a powerful tool for machine learning with numerous applications. Quantum games model interactions between strategic players who have access to quantum resources, and several recent works have studied {learning in} the competitive regime of quantum zero-sum games. Going beyond this setting, we introduce quantum common-interest games (CIGs) where players have density…
▽ More
Learning in games has emerged as a powerful tool for machine learning with numerous applications. Quantum games model interactions between strategic players who have access to quantum resources, and several recent works have studied {learning in} the competitive regime of quantum zero-sum games. Going beyond this setting, we introduce quantum common-interest games (CIGs) where players have density matrices as strategies and their interests are perfectly aligned. We bridge the gap between optimization and game theory by establishing the equivalence between KKT (first-order stationary) points of an instance of the Best Separable State (BSS) problem and the Nash equilibria of its corresponding quantum CIG. This allows learning dynamics for the quantum CIG to be seen as decentralized algorithms for the BSS problem. Taking the perspective of learning in games, we then introduce non-commutative extensions of the continuous-time replicator dynamics and the discrete-time best response dynamics/linear multiplicative weights update for learning in quantum CIGs. We prove analogues of classical convergence results of the dynamics and explore differences which arise in the quantum setting. Finally, we corroborate our theoretical findings through extensive experiments.
△ Less
Submitted 30 March, 2025; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics
Authors:
Aamal Abbas Hussain,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted.
To make progress in resolving this problem, we study the dynamics of smooth Q…
▽ More
Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted.
To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games.
Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Heterogeneous Beliefs and Multi-Population Learning in Network Games
Authors:
Shuyue Hu,
Harold Soh,
Georgios Piliouras
Abstract:
The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP).We show that the system state -- a probability distribution over…
▽ More
The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP).We show that the system state -- a probability distribution over beliefs -- evolves according to a system of partial differential equations akin to the continuity equations that commonly desccribe transport phenomena in physical systems. We establish the convergence of SFP to Quantal Response Equilibria in different classes of games capturing both network competition as well as network coordination. We also prove that the beliefs will eventually homogenize in all network games. Although the initial belief heterogeneity disappears in the limit, we show that it plays a crucial role for equilibrium selection in the case of coordination games as it helps select highly desirable equilibria. Contrary, in the case of network competition, the resulting limit behavior is independent of the initialization of beliefs, even when the underlying game has many distinct Nash equilibria.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps
Authors:
Volkan Cevher,
Georgios Piliouras,
Ryann Sim,
Stratis Skoulakis
Abstract:
In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accur…
▽ More
In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accuracy $ε$ with only $O(\log 1/ε)$ additional gradient-calls through the iterations of a contraction map. Then combining the analysis of (PP) method with an error-propagation analysis we establish that the resulting first order method, called Clairvoyant Extra Gradient, admits near-optimal time-average convergence for general domains and last-iterate convergence in the unconstrained case.
△ Less
Submitted 16 January, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
Optimality Despite Chaos in Fee Markets
Authors:
Stefanos Leonardos,
Daniël Reijsbergen,
Barnabé Monnot,
Georgios Piliouras
Abstract:
Transaction fee markets are essential components of blockchain economies, as they resolve the inherent scarcity in the number of transactions that can be added to each block. In early blockchain protocols, this scarcity was resolved through a first-price auction in which users were forced to guess appropriate bids from recent blockchain data. Ethereum's EIP-1559 fee market reform streamlines this…
▽ More
Transaction fee markets are essential components of blockchain economies, as they resolve the inherent scarcity in the number of transactions that can be added to each block. In early blockchain protocols, this scarcity was resolved through a first-price auction in which users were forced to guess appropriate bids from recent blockchain data. Ethereum's EIP-1559 fee market reform streamlines this process through the use of a base fee that is increased (or decreased) whenever a block exceeds (or fails to meet) a specified target block size. Previous work has found that the EIP-1559 mechanism may lead to a base fee process that is inherently chaotic, in which case the base fee does not converge to a fixed point even under ideal conditions. However, the impact of this chaotic behavior on the fee market's main design goal -- blocks whose long-term average size equals the target -- has not previously been explored. As our main contribution, we derive near-optimal upper and lower bounds for the time-average block size in the EIP-1559 mechanism despite its possibly chaotic evolution. Our lower bound is equal to the target utilization level whereas our upper bound is approximately 6% higher than optimal. Empirical evidence is shown in great agreement with these theoretical predictions. Specifically, the historical average was approximately 2.9% larger than the target rage under Proof-of-Work and decreased to approximately 2.0% after Ethereum's transition to Proof-of-Stake. We also find that an approximate version of EIP-1559 achieves optimality even in the absence of convergence.
△ Less
Submitted 15 December, 2022; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence
Authors:
Rahul Jain,
Georgios Piliouras,
Ryann Sim
Abstract:
Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update…
▽ More
Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update method) and its continuous analogue, Quantum Replicator Dynamics. When each player selects their state according to quantum replicator dynamics, we show that the system exhibits conservation laws in a quantum-information theoretic sense. Moreover, we show that the system exhibits Poincare recurrence, meaning that almost all orbits return arbitrarily close to their initial conditions infinitely often. Our analysis generalizes previous results in the case of classical games.
△ Less
Submitted 26 April, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Learning Correlated Equilibria in Mean-Field Games
Authors:
Paul Muller,
Romuald Elie,
Mark Rowland,
Mathieu Lauriere,
Julien Perolat,
Sarah Perrin,
Matthieu Geist,
Georgios Piliouras,
Olivier Pietquin,
Karl Tuyls
Abstract:
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-…
▽ More
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games
Authors:
Georgios Piliouras,
Lillian Ratliff,
Ryann Sim,
Stratis Skoulakis
Abstract:
The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum propert…
▽ More
The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum property of agent payoffs, the efficient representation of graphical games as well the expressive power of EFGs. We examine the convergence properties of Optimistic Gradient Ascent (OGA) in these games. We prove that the time-average behavior of such online learning dynamics exhibits $O(1/T)$ rate convergence to the set of Nash Equilibria. Moreover, we show that the day-to-day behavior also converges to Nash with rate $O(c^{-t})$ for some game-dependent constant $c>0$.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Alternating Mirror Descent for Constrained Min-Max Games
Authors:
Andre Wibisono,
Molei Tao,
Georgios Piliouras
Abstract:
In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrai…
▽ More
In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020).
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Nash, Conley, and Computation: Impossibility and Incompleteness in Game Dynamics
Authors:
Jason Milionis,
Christos Papadimitriou,
Georgios Piliouras,
Kelly Spendlove
Abstract:
Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players' behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular the concepts of chain recurrence, attra…
▽ More
Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players' behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular the concepts of chain recurrence, attractors, and Conley index, to prove a general impossibility result: there exist games for which any dynamics is bound to have starting points that do not end up at a Nash equilibrium. We also prove a stronger result for $ε$-approximate Nash equilibria: there are games such that no game dynamics can converge (in an appropriate sense) to $ε$-Nash equilibria, and in fact the set of such games has positive measure. Further numerical results demonstrate that this holds for any $ε$ between zero and $0.09$. Our results establish that, although the notions of Nash equilibria (and its computation-inspired approximations) are universally applicable in all games, they are also fundamentally incomplete as predictors of long term behavior, regardless of the choice of dynamics.
△ Less
Submitted 26 March, 2022;
originally announced March 2022.
-
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
Authors:
Mathieu Laurière,
Sarah Perrin,
Sertan Girgin,
Paul Muller,
Ayush Jain,
Theophile Cabannes,
Georgios Piliouras,
Julien Pérolat,
Romuald Élie,
Olivier Pietquin,
Matthieu Geist
Abstract:
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant…
▽ More
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.
△ Less
Submitted 17 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
No-Regret Learning in Games is Turing Complete
Authors:
Gabriel P. Andrade,
Rafael Frongillo,
Georgios Piliouras
Abstract:
Games are natural models for multi-agent machine learning settings, such as generative adversarial networks (GANs). The desirable outcomes from algorithmic interactions in these games are encoded as game theoretic equilibrium concepts, e.g. Nash and coarse correlated equilibria. As directly computing an equilibrium is typically impractical, one often aims to design learning algorithms that iterati…
▽ More
Games are natural models for multi-agent machine learning settings, such as generative adversarial networks (GANs). The desirable outcomes from algorithmic interactions in these games are encoded as game theoretic equilibrium concepts, e.g. Nash and coarse correlated equilibria. As directly computing an equilibrium is typically impractical, one often aims to design learning algorithms that iteratively converge to equilibria. A growing body of negative results casts doubt on this goal, from non-convergence to chaotic and even arbitrary behaviour. In this paper we add a strong negative result to this list: learning in games is Turing complete. Specifically, we prove Turing completeness of the replicator dynamic on matrix games, one of the simplest possible settings. Our results imply the undecicability of reachability problems for learning algorithms in games, a special case of which is determining equilibrium convergence.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.