Search | arXiv e-print repository

Avoiding Obfuscation with Prover-Estimator Debate

Authors: Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras

Abstract: Training powerful AI systems to exhibit desired behaviors hinges on the ability to provide accurate human supervision on increasingly complex tasks. A promising approach to this problem is to amplify human judgement by leveraging the power of two competing AIs in a debate about the correct solution to a given problem. Prior theoretical work has provided a complexity-theoretic formalization of AI d… ▽ More Training powerful AI systems to exhibit desired behaviors hinges on the ability to provide accurate human supervision on increasingly complex tasks. A promising approach to this problem is to amplify human judgement by leveraging the power of two competing AIs in a debate about the correct solution to a given problem. Prior theoretical work has provided a complexity-theoretic formalization of AI debate, and posed the problem of designing protocols for AI debate that guarantee the correctness of human judgements for as complex a class of problems as possible. Recursive debates, in which debaters decompose a complex problem into simpler subproblems, hold promise for growing the class of problems that can be accurately judged in a debate. However, existing protocols for recursive debate run into the obfuscated arguments problem: a dishonest debater can use a computationally efficient strategy that forces an honest opponent to solve a computationally intractable problem to win. We mitigate this problem with a new recursive debate protocol that, under certain stability assumptions, ensures that an honest debater can win with a strategy requiring computational efficiency comparable to their opponent. △ Less

Submitted 16 June, 2025; originally announced June 2025.

arXiv:2506.13086 [pdf, ps, other]

Fast and Furious Symmetric Learning in Zero-Sum Games: Gradient Descent as Fictitious Play

Authors: John Lazarsfeld, Georgios Piliouras, Ryann Sim, Andre Wibisono

Abstract: This paper investigates the sublinear regret guarantees of two non-no-regret algorithms in zero-sum games: Fictitious Play, and Online Gradient Descent with constant stepsizes. In general adversarial online learning settings, both algorithms may exhibit instability and linear regret due to no regularization (Fictitious Play) or small amounts of regularization (Gradient Descent). However, their abi… ▽ More This paper investigates the sublinear regret guarantees of two non-no-regret algorithms in zero-sum games: Fictitious Play, and Online Gradient Descent with constant stepsizes. In general adversarial online learning settings, both algorithms may exhibit instability and linear regret due to no regularization (Fictitious Play) or small amounts of regularization (Gradient Descent). However, their ability to obtain tighter regret bounds in two-player zero-sum games is less understood. In this work, we obtain strong new regret guarantees for both algorithms on a class of symmetric zero-sum games that generalize the classic three-strategy Rock-Paper-Scissors to a weighted, n-dimensional regime. Under symmetric initializations of the players' strategies, we prove that Fictitious Play with any tiebreaking rule has $O(\sqrt{T})$ regret, establishing a new class of games for which Karlin's Fictitious Play conjecture holds. Moreover, by leveraging a connection between the geometry of the iterates of Fictitious Play and Gradient Descent in the dual space of payoff vectors, we prove that Gradient Descent, for almost all symmetric initializations, obtains a similar $O(\sqrt{T})$ regret bound when its stepsize is a sufficiently large constant. For Gradient Descent, this establishes the first "fast and furious" behavior (i.e., sublinear regret without time-vanishing stepsizes) for zero-sum games larger than 2x2. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: COLT 2025

arXiv:2506.05005 [pdf, ps, other]

Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games

Authors: Ashkan Soleymani, Georgios Piliouras, Gabriele Farina

Abstract: Recent work [Soleymani et al., 2025] introduced a variant of Optimistic Multiplicative Weights Updates (OMWU) that adaptively controls the learning pace in a dynamic, non-monotone manner, achieving new state-of-the-art regret minimization guarantees in general games. In this work, we demonstrate that no-regret learning acceleration through adaptive pacing of the learners is not an isolated phenome… ▽ More Recent work [Soleymani et al., 2025] introduced a variant of Optimistic Multiplicative Weights Updates (OMWU) that adaptively controls the learning pace in a dynamic, non-monotone manner, achieving new state-of-the-art regret minimization guarantees in general games. In this work, we demonstrate that no-regret learning acceleration through adaptive pacing of the learners is not an isolated phenomenon. We introduce \emph{Cautious Optimism}, a framework for substantially faster regularized learning in general games. Cautious Optimism takes as input any instance of Follow-the-Regularized-Leader (FTRL) and outputs an accelerated no-regret learning algorithm by pacing the underlying FTRL with minimal computational overhead. Importantly, we retain uncoupledness (learners do not need to know other players' utilities). Cautious Optimistic FTRL achieves near-optimal $O_T(\log T)$ regret in diverse self-play (mixing-and-matching regularizers) while preserving the optimal $O(\sqrt{T})$ regret in adversarial scenarios. In contrast to prior works (e.g. Syrgkanis et al. [2015], Daskalakis et al. [2021]), our analysis does not rely on monotonic step-sizes, showcasing a novel route for fast learning in general games. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: Extended abstract appeared at Twenty-Sixth ACM Conference on Economics and Computation (EC), 2025

arXiv:2505.10361 [pdf, other]

Plasticity as the Mirror of Empowerment

Authors: David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh

Abstract: Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observ… ▽ More Agents are minimally entities that are influenced by their past observations and act to influence future observations. This latter capacity is captured by empowerment, which has served as a vital framing concept across artificial intelligence and cognitive science. This former capacity, however, is equally foundational: In what ways, and to what extent, can an agent be influenced by what it observes? In this paper, we ground this concept in a universal agent-centric measure that we refer to as plasticity, and reveal a fundamental connection to empowerment. Following a set of desiderata on a suitable definition, we define plasticity using a new information-theoretic quantity we call the generalized directed information. We show that this new quantity strictly generalizes the directed information introduced by Massey (1990) while preserving all of its desirable properties. Our first finding is that plasticity is the mirror of empowerment: The agent's plasticity is identical to the empowerment of the environment, and vice versa. Our second finding establishes a tension between the plasticity and empowerment of an agent, suggesting that agent design needs to be mindful of both characteristics. We explore the implications of these findings, and suggest that plasticity, empowerment, and their relationship are essential to understanding agency. △ Less

Submitted 15 May, 2025; originally announced May 2025.

arXiv:2503.24340 [pdf, ps, other]

doi 10.1145/3717823.3718242

Faster Rates for No-Regret Learning in General Games via Cautious Optimism

Authors: Ashkan Soleymani, Georgios Piliouras, Gabriele Farina

Abstract: We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Reg… ▽ More We establish the first uncoupled learning algorithm that attains $O(n \log^2 d \log T)$ per-player regret in multi-player general-sum games, where $n$ is the number of players, $d$ is the number of actions available to each player, and $T$ is the number of repetitions of the game. Our results exponentially improve the dependence on $d$ compared to the $O(n\, d \log T)$ regret attainable by Log-Regularized Lifted Optimistic FTRL [Far+22c], and also reduce the dependence on the number of iterations $T$ from $\log^4 T$ to $\log T$ compared to Optimistic Hedge, the previously well-studied algorithm with $O(n \log d \log^4 T)$ regret [DFG21]. Our algorithm is obtained by combining the classic Optimistic Multiplicative Weights Update (OMWU) with an adaptive, non-monotonic learning rate that paces the learning process of the players, making them more cautious when their regret becomes too negative. △ Less

Submitted 31 March, 2025; originally announced March 2025.

Comments: Appeared at STOC 2025

arXiv:2503.22850 [pdf, other]

Passivity, No-Regret, and Convergent Learning in Contractive Games

Authors: Hassan Abdelraouf, Georgios Piliouras, Jeff S. Shamma

Abstract: We investigate the interplay between passivity, no-regret, and convergence in contractive games for various learning dynamic models and their higher-order variants. Our setting is continuous time. Building on prior work for replicator dynamics, we show that if learning dynamics satisfy a passivity condition between the payoff vector and the difference between its evolving strategy and any fixed st… ▽ More We investigate the interplay between passivity, no-regret, and convergence in contractive games for various learning dynamic models and their higher-order variants. Our setting is continuous time. Building on prior work for replicator dynamics, we show that if learning dynamics satisfy a passivity condition between the payoff vector and the difference between its evolving strategy and any fixed strategy, then it achieves finite regret. We then establish that the passivity condition holds for various learning dynamics and their higher-order variants. Consequentially, the higher-order variants can achieve convergence to Nash equilibrium in cases where their standard order counterparts cannot, while maintaining a finite regret property. We provide numerical examples to illustrate the lack of finite regret of different evolutionary dynamic models that violate the passivity property. We also examine the fragility of the finite regret property in the case of perturbed learning dynamics. Continuing with passivity, we establish another connection between finite regret and passivity, but with the related equilibrium-independent passivity property. Finally, we present a passivity-based classification of dynamic models according to the various passivity notions they satisfy, namely, incremental passivity, $δ$-passivity, and equilibrium-independent passivity. This passivity-based classification provides a framework to analyze the convergence of learning dynamic models in contractive games. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2502.20170 [pdf, other]

Re-evaluating Open-ended Evaluation of Large Language Models

Authors: Siqi Liu, Ian Gemp, Luke Marris, Georgios Piliouras, Nicolas Heess, Marc Lanctot

Abstract: Evaluation has traditionally focused on ranking candidates for a specific skill. Modern generalist models, such as Large Language Models (LLMs), decidedly outpace this paradigm. Open-ended evaluation systems, where candidate models are compared on user-submitted prompts, have emerged as a popular solution. Despite their many advantages, we show that the current Elo-based rating systems can be susc… ▽ More Evaluation has traditionally focused on ranking candidates for a specific skill. Modern generalist models, such as Large Language Models (LLMs), decidedly outpace this paradigm. Open-ended evaluation systems, where candidate models are compared on user-submitted prompts, have emerged as a popular solution. Despite their many advantages, we show that the current Elo-based rating systems can be susceptible to and even reinforce biases in data, intentional or accidental, due to their sensitivity to redundancies. To address this issue, we propose evaluation as a 3-player game, and introduce novel game-theoretic solution concepts to ensure robustness to redundancy. We show that our method leads to intuitive ratings and provide insights into the competitive landscape of LLM development. △ Less

Submitted 8 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

Comments: Published at ICLR 2025

arXiv:2502.14143 [pdf, other]

Multi-Agent Risks from Advanced AI

Authors: Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Quentin Feuillade-Montixi, Matija Franklin, Esben Kran , et al. (19 additional authors not shown)

Abstract: The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, a… ▽ More The rapid development of advanced AI agents and the imminent deployment of many instances of these agents will give rise to multi-agent systems of unprecedented complexity. These systems pose novel and under-explored risks. In this report, we provide a structured taxonomy of these risks by identifying three key failure modes (miscoordination, conflict, and collusion) based on agents' incentives, as well as seven key risk factors (information asymmetries, network effects, selection pressures, destabilising dynamics, commitment problems, emergent agency, and multi-agent security) that can underpin them. We highlight several important instances of each risk, as well as promising directions to help mitigate them. By anchoring our analysis in a range of real-world examples and experimental evidence, we illustrate the distinct challenges posed by multi-agent systems and their implications for the safety, governance, and ethics of advanced AI. △ Less

Submitted 19 February, 2025; originally announced February 2025.

Comments: Cooperative AI Foundation, Technical Report #1

arXiv:2502.11645 [pdf, other]

Deviation Ratings: A General, Clone-Invariant Rating Method

Authors: Luke Marris, Siqi Liu, Ian Gemp, Georgios Piliouras, Marc Lanctot

Abstract: Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These strategic interactions may be agentic (e.g. players trying to win), fundamental (e.g. cost vs quality), or complementary (e.g. niche finding and specialization). In such a formulation, it is the stra… ▽ More Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These strategic interactions may be agentic (e.g. players trying to win), fundamental (e.g. cost vs quality), or complementary (e.g. niche finding and specialization). In such a formulation, it is the strategies (actions, policies, agents, models, tasks, prompts, etc.) that are rated. However, the rating problem is complicated by redundancy and complexity of N-player strategic interactions. Repeated or similar strategies can distort ratings for those that counter or complement them. Previous work proposed ``clone invariant'' ratings to handle such redundancies, but this was limited to two-player zero-sum (i.e. strictly competitive) interactions. This work introduces the first N-player general-sum clone invariant rating, called deviation ratings, based on coarse correlated equilibria. The rating is explored on several domains including LLMs evaluation. △ Less

Submitted 17 February, 2025; originally announced February 2025.

arXiv:2502.04403 [pdf, other]

Agency Is Frame-Dependent

Authors: David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh

Abstract: Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess age… ▽ More Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning. △ Less

Submitted 6 February, 2025; originally announced February 2025.

arXiv:2412.20203 [pdf, other]

No-regret learning in harmonic games: Extrapolation in the face of conflicting interests

Authors: Davide Legacci, Panayotis Mertikopoulos, Christos H. Papadimitriou, Georgios Piliouras, Bary S. R. Pradelski

Abstract: The long-run behavior of multi-agent learning - and, in particular, no-regret learning - is relatively well-understood in potential games, where players have aligned interests. By contrast, in harmonic games - the strategic counterpart of potential games, where players have conflicting interests - very little is known outside the narrow subclass of 2-player zero-sum games with a fully-mixed equili… ▽ More The long-run behavior of multi-agent learning - and, in particular, no-regret learning - is relatively well-understood in potential games, where players have aligned interests. By contrast, in harmonic games - the strategic counterpart of potential games, where players have conflicting interests - very little is known outside the narrow subclass of 2-player zero-sum games with a fully-mixed equilibrium. Our paper seeks to partially fill this gap by focusing on the full class of (generalized) harmonic games and examining the convergence properties of follow-the-regularized-leader (FTRL), the most widely studied class of no-regret learning schemes. As a first result, we show that the continuous-time dynamics of FTRL are Poincaré recurrent, that is, they return arbitrarily close to their starting point infinitely often, and hence fail to converge. In discrete time, the standard, "vanilla" implementation of FTRL may lead to even worse outcomes, eventually trapping the players in a perpetual cycle of best-responses. However, if FTRL is augmented with a suitable extrapolation step - which includes as special cases the optimistic and mirror-prox variants of FTRL - we show that learning converges to a Nash equilibrium from any initial condition, and all players are guaranteed at most O(1) regret. These results provide an in-depth understanding of no-regret learning in harmonic games, nesting prior work on 2-player zero-sum games, and showing at a high level that harmonic games are the canonical complement of potential games, not only from a strategic, but also from a dynamic viewpoint. △ Less

Submitted 28 December, 2024; originally announced December 2024.

Comments: 36 pages, 5 figures

MSC Class: Primary 91A10; 91A26; secondary 68Q32; 68T02

arXiv:2412.19010 [pdf, other]

A theory of appropriateness with applications to generative artificial intelligence

Authors: Joel Z. Leibo, Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, Raphael Koster, Edgar A. Duéñez-Guzmán, William S. Isaac, Georgios Piliouras, Stanley M. Bileschi, Iyad Rahwan, Simon Osindero

Abstract: What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which action… ▽ More What is appropriateness? Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations. We act one way with our friends, another with our family, and yet another in the office. Likewise for AI, appropriate behavior for a comedy-writing assistant is not the same as appropriate behavior for a customer-service representative. What determines which actions are appropriate in which contexts? And what causes these standards to change over time? Since all judgments of AI appropriateness are ultimately made by humans, we need to understand how appropriateness guides human decision making in order to properly evaluate AI decision making and improve it. This paper presents a theory of appropriateness: how it functions in human society, how it may be implemented in the brain, and what it means for responsible deployment of generative AI technology. △ Less

Submitted 25 December, 2024; originally announced December 2024.

Comments: 115 pages, 2 figures

arXiv:2412.05747 [pdf, other]

Charting the Shapes of Stories with Game Theory

Authors: Constantinos Daskalakis, Ian Gemp, Yanchen Jiang, Renato Paes Leme, Christos Papadimitriou, Georgios Piliouras

Abstract: Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories and insights from structure. Historically, these tools have been restricted to one dimensional charts and dynamic social networks; however, modern AI offers the possibility of iden… ▽ More Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to extract structure from stories and insights from structure. Historically, these tools have been restricted to one dimensional charts and dynamic social networks; however, modern AI offers the possibility of identifying more fully the plot structure, character incentives, and, importantly, counterfactual plot lines that the story could have taken but did not take. In this work, we use AI to model the structure of stories as game-theoretic objects, amenable to quantitative analysis. This allows us to not only interrogate each character's decision making, but also possibly peer into the original author's conception of the characters' world. We demonstrate our proposed technique on Shakespeare's famous Romeo and Juliet. We conclude with a discussion of how our analysis could be replicated in broader contexts, including real-life scenarios. △ Less

Submitted 7 December, 2024; originally announced December 2024.

Comments: NeurIPS 2024 Creative AI Track

arXiv:2411.01495 [pdf, other]

Interval maps mimicking circle rotations

Authors: Jakub Bielawski, Thiparat Chotibut, Fryderyk Falniowski, Michał Misiurewicz, Georgios Piliouras

Abstract: We investigate the dynamics of maps of the real line whose behavior on an invariant interval is close to a rational rotation on the circle. We concentrate on a specific two-parameter family, describing the dynamics arising from models in game theory, mathematical biology and machine learning. If one parameter is a rational number, $k/n$, with $k,n$ coprime, and the second one is large enough, we p… ▽ More We investigate the dynamics of maps of the real line whose behavior on an invariant interval is close to a rational rotation on the circle. We concentrate on a specific two-parameter family, describing the dynamics arising from models in game theory, mathematical biology and machine learning. If one parameter is a rational number, $k/n$, with $k,n$ coprime, and the second one is large enough, we prove that there is a periodic orbit of period $n$. It behaves like an orbit of the circle rotation by an angle $2πk/n$ and attracts trajectories of Lebesgue almost all starting points. We also discover numerically other interesting phenomena. While we do not give rigorous proofs for them, we provide convincing explanations. △ Less

Submitted 3 November, 2024; originally announced November 2024.

MSC Class: 37E05

arXiv:2410.16600 [pdf, ps, other]

Convex Markov Games: A New Frontier for Multi-Agent Reinforcement Learning

Authors: Ian Gemp, Andreas Haupt, Luke Marris, Siqi Liu, Georgios Piliouras

Abstract: Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibri… ▽ More Behavioral diversity, expert imitation, fairness, safety goals and others give rise to preferences in sequential decision making domains that do not decompose additively across time. We introduce the class of convex Markov games that allow general convex preferences over occupancy measures. Despite infinite time horizon and strictly higher generality than Markov games, pure strategy Nash equilibria exist. Furthermore, equilibria can be approximated empirically by performing gradient descent on an upper bound of exploitability. Our experiments reveal novel solutions to classic repeated normal-form games, find fair solutions in a repeated asymmetric coordination game, and prioritize safe long-term behavior in a robot warehouse environment. In the prisoner's dilemma, our algorithm leverages transient imitation to find a policy profile that deviates from observed human play only slightly, yet achieves higher per-player utility while also being three orders of magnitude less exploitable. △ Less

Submitted 16 June, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: Published at ICML 2025

arXiv:2408.11146 [pdf, other]

Swim till You Sink: Computing the Limit of a Game

Authors: Rashida Hakim, Jason Milionis, Christos Papadimitriou, Georgios Piliouras

Abstract: During 2023, two interesting results were proven about the limit behavior of game dynamics: First, it was shown that there is a game for which no dynamics converges to the Nash equilibria. Second, it was shown that the sink equilibria of a game adequately capture the limit behavior of natural game dynamics. These two results have created a need and opportunity to articulate a principled computatio… ▽ More During 2023, two interesting results were proven about the limit behavior of game dynamics: First, it was shown that there is a game for which no dynamics converges to the Nash equilibria. Second, it was shown that the sink equilibria of a game adequately capture the limit behavior of natural game dynamics. These two results have created a need and opportunity to articulate a principled computational theory of the meaning of the game that is based on game dynamics. Given any game in normal form, and any prior distribution of play, we study the problem of computing the asymptotic behavior of a class of natural dynamics called the noisy replicator dynamics as a limit distribution over the sink equilibria of the game. When the prior distribution has pure strategy support, we prove this distribution can be computed efficiently, in near-linear time to the size of the best-response graph. When the distribution can be sampled -- for example, if it is the uniform distribution over all mixed strategy profiles -- we show through experiments that the limit distribution of reasonably large games can be estimated quite accurately through sampling and simulation. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.07685 [pdf, ps, other]

Auto-bidding and Auctions in Online Advertising: A Survey

Authors: Gagan Aggarwal, Ashwinkumar Badanidiyuru, Santiago R. Balseiro, Kshipra Bhawalkar, Yuan Deng, Zhe Feng, Gagan Goel, Christopher Liaw, Haihao Lu, Mohammad Mahdian, Jieming Mao, Aranyak Mehta, Vahab Mirrokni, Renato Paes Leme, Andres Perlroth, Georgios Piliouras, Jon Schneider, Ariel Schvartzman, Balasubramanian Sivan, Kelly Spendlove, Yifeng Teng, Di Wang, Hanrui Zhang, Mingfei Zhao, Wennan Zhu , et al. (1 additional authors not shown)

Abstract: In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction d… ▽ More In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction design. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2406.19350 [pdf, other]

Complex Dynamics in Autobidding Systems

Authors: Renato Paes Leme, Georgios Piliouras, Jon Schneider, Kelly Spendlove, Song Zuo

Abstract: It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple system… ▽ More It has become the default in markets such as ad auctions for participants to bid in an auction through automated bidding agents (autobidders) which adjust bids over time to satisfy return-over-spend constraints. Despite the prominence of such systems for the internet economy, their resulting dynamical behavior is still not well understood. Although one might hope that such relatively simple systems would typically converge to the equilibria of their underlying auctions, we provide a plethora of results that show the emergence of complex behavior, such as bi-stability, periodic orbits and quasi periodicity. We empirically observe how the market structure (expressed as motifs) qualitatively affects the behavior of the dynamics. We complement it with theoretical results showing that autobidding systems can simulate both linear dynamical systems as well logical boolean gates. △ Less

Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.10603 [pdf, other]

Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg

Authors: Yi Feng, Georgios Piliouras, Xiao Wang

Abstract: We investigate the accuracy of prediction in deterministic learning dynamics of zero-sum games with random initializations, specifically focusing on observer uncertainty and its relationship to the evolution of covariances. Zero-sum games are a prominent field of interest in machine learning due to their various applications. Concurrently, the accuracy of prediction in dynamical systems from mecha… ▽ More We investigate the accuracy of prediction in deterministic learning dynamics of zero-sum games with random initializations, specifically focusing on observer uncertainty and its relationship to the evolution of covariances. Zero-sum games are a prominent field of interest in machine learning due to their various applications. Concurrently, the accuracy of prediction in dynamical systems from mechanics has long been a classic subject of investigation since the discovery of the Heisenberg Uncertainty Principle. This principle employs covariance and standard deviation of particle states to measure prediction accuracy. In this study, we bring these two approaches together to analyze the Follow-the-Regularized-Leader (FTRL) algorithm in two-player zero-sum games. We provide growth rates of covariance information for continuous-time FTRL, as well as its two canonical discretization methods (Euler and Symplectic). A Heisenberg-type inequality is established for FTRL. Our analysis and experiments also show that employing Symplectic discretization enhances the accuracy of prediction in learning dynamics. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: Accepted for ICML 2024

arXiv:2404.01066 [pdf, ps, other]

Learning and steering game dynamics towards desirable outcomes

Authors: Ilayda Canyakmaz, Iosif Sakos, Wayne Lin, Antonios Varvitsiotis, Georgios Piliouras

Abstract: Game dynamics, which describe how agents' strategies evolve over time based on past interactions, can exhibit a variety of undesirable behaviours including convergence to suboptimal equilibria, cycling, and chaos. While central planners can employ incentives to mitigate such behaviors and steer game dynamics towards desirable outcomes, the effectiveness of such interventions critically relies on a… ▽ More Game dynamics, which describe how agents' strategies evolve over time based on past interactions, can exhibit a variety of undesirable behaviours including convergence to suboptimal equilibria, cycling, and chaos. While central planners can employ incentives to mitigate such behaviors and steer game dynamics towards desirable outcomes, the effectiveness of such interventions critically relies on accurately predicting agents' responses to these incentives -- a task made particularly challenging when the underlying dynamics are unknown and observations are limited. To address this challenge, this work introduces the Side Information Assisted Regression with Model Predictive Control (SIAR-MPC) framework. We extend the recently introduced SIAR method to incorporate the effect of control, enabling it to utilize side-information constraints inherent to game-theoretic applications to model agents' responses to incentives from scarce data. MPC then leverages this model to implement dynamic incentive adjustments. Our experiments demonstrate the effectiveness of SIAR-MPC in guiding systems towards socially optimal equilibria, stabilizing chaotic and cycling behaviors. Notably, it achieves these results in data-scarce settings of few learning samples, where well-known system identification methods paired with MPC show less effective results. △ Less

Submitted 10 December, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.15848 [pdf, other]

On the Stability of Learning in Network Games with Many Players

Authors: Aamal Hussain, Dan Leonte, Francesco Belardinelli, Georgios Piliouras

Abstract: Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players. To make progress towards addressing this chall… ▽ More Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players. To make progress towards addressing this challenge we study the Q-Learning Dynamics, a classical model for exploration and exploitation in multi-agent learning. In particular, we study the behaviour of Q-Learning on games where interactions between agents are constrained by a network. We determine a number of sufficient conditions, depending on the game and network structure, which guarantee that agent strategies converge to a unique stable strategy, called the Quantal Response Equilibrium (QRE). Crucially, these sufficient conditions are independent of the total number of agents, allowing for provable convergence in arbitrarily large games. Next, we compare the learned QRE to the underlying NE of the game, by showing that any QRE is an $ε$-approximate Nash Equilibrium. We first provide tight bounds on $ε$ and show how these bounds lead naturally to a centralised scheme for choosing exploration rates, which enables independent learners to learn stable approximate Nash Equilibrium strategies. We validate the method through experiments and demonstrate its effectiveness even in the presence of numerous agents and actions. Through these results, we show that independent learning dynamics may converge to approximate Nash Equilibria, even in the presence of many agents. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: AAMAS 2024. arXiv admin note: text overlap with arXiv:2307.13922

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2402.16985 [pdf, other]

Visualizing 2x2 Normal-Form Games: twoxtwogame LaTeX Package

Authors: Luke Marris, Ian Gemp, Siqi Liu, Joel Z. Leibo, Georgios Piliouras

Abstract: Normal-form games with two players, each with two strategies, are the most studied class of games. These so-called 2x2 games are used to model a variety of strategic interactions. They appear in game theory, economics, and artificial intelligence research. However, there lacks tools for describing and visualizing such games. This work introduces a LaTeX package for visualizing 2x2 games. This work… ▽ More Normal-form games with two players, each with two strategies, are the most studied class of games. These so-called 2x2 games are used to model a variety of strategic interactions. They appear in game theory, economics, and artificial intelligence research. However, there lacks tools for describing and visualizing such games. This work introduces a LaTeX package for visualizing 2x2 games. This work has two goals: first, to provide high-quality tools and vector graphic visualizations, suitable for scientific publications. And second, to help promote standardization of names and representations of 2x2 games. The LaTeX package, twoxtwogame, is maintained on GitHub and mirrored on CTAN, and is available under a permissive Apache 2 license. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15849 [pdf, other]

MEV Sharing with Dynamic Extraction Rates

Authors: Pedro Braga, Georgios Chionas, Piotr Krysta, Stefanos Leonardos, Georgios Piliouras, Carmine Ventre

Abstract: Maximal Extractable Value (MEV) has emerged as a new frontier in the design of blockchain systems. In this paper, we propose making the MEV extraction rate as part of the protocol design space. Our aim is to leverage this parameter to maintain a healthy balance between block producers (who need to be compensated) and users (who need to feel encouraged to transact). We follow the approach introduce… ▽ More Maximal Extractable Value (MEV) has emerged as a new frontier in the design of blockchain systems. In this paper, we propose making the MEV extraction rate as part of the protocol design space. Our aim is to leverage this parameter to maintain a healthy balance between block producers (who need to be compensated) and users (who need to feel encouraged to transact). We follow the approach introduced by EIP-1559 and design a similar mechanism to dynamically update the MEV extraction rate with the goal of stabilizing it at a target value. We study the properties of this dynamic mechanism and show that, while convergence to the target can be guaranteed for certain parameters, instability, and even chaos, can occur in other cases. Despite these complexities, under general conditions, the system concentrates in a neighborhood of the target equilibrium implying high long-term performance. Our work establishes, the first to our knowledge, dynamic framework for the integral problem of MEV sharing between extractors and users. △ Less

Submitted 30 September, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: Extended abstract in the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024)

MSC Class: 37; 65P20; 91; 93A14; 93A16 ACM Class: C.2.4; F.2; I.2.11; J.2; J.4

arXiv:2402.08393 [pdf, other]

NfgTransformer: Equivariant Representation Learning for Normal-form Games

Authors: Siqi Liu, Luke Marris, Georgios Piliouras, Ian Gemp, Nicolas Heess

Abstract: Normal-form games (NFGs) are the fundamental model of strategic interaction. We study their representation using neural networks. We describe the inherent equivariance of NFGs -- any permutation of strategies describes an equivalent game -- as well as the challenges this poses for representation learning. We then propose the NfgTransformer architecture that leverages this equivariance, leading to… ▽ More Normal-form games (NFGs) are the fundamental model of strategic interaction. We study their representation using neural networks. We describe the inherent equivariance of NFGs -- any permutation of strategies describes an equivalent game -- as well as the challenges this poses for representation learning. We then propose the NfgTransformer architecture that leverages this equivariance, leading to state-of-the-art performance in a range of game-theoretic tasks including equilibrium-solving, deviation gain estimation and ranking, with a common approach to NFG representation. We show that the resulting model is interpretable and versatile, paving the way towards deep learning systems capable of game-theoretic reasoning when interacting with humans and with each other. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Published at ICLR 2024. Open-sourced at https://github.com/google-deepmind/nfg_transformer

arXiv:2402.03928 [pdf, other]

Approximating the Core via Iterative Coalition Sampling

Authors: Ian Gemp, Marc Lanctot, Luke Marris, Yiran Mao, Edgar Duéñez-Guzmán, Sarah Perrin, Andras Gyorgy, Romuald Elie, Georgios Piliouras, Michael Kaisers, Daniel Hennes, Kalesha Bullard, Kate Larson, Yoram Bachrach

Abstract: The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a… ▽ More The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, and to fully embrace cooperative game theory contributions in domains such as explainable AI (XAI), where the core can complement the Shapley values to identify influential features or instances supporting predictions by black-box models. We propose novel iterative algorithms for computing variants of the core, which avoid the computational bottleneck of many other approaches; namely solving large linear programs. As such, they scale better to very large problems as we demonstrate across different classes of cooperative games, including weighted voting games, induced subgraph games, and marginal contribution networks. We also explore our algorithms in the context of XAI, providing further evidence of the power of the core for such applications. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: Published in AAMAS 2024

arXiv:2402.01704 [pdf, other]

Steering Language Models with Game-Theoretic Solvers

Authors: Ian Gemp, Roma Patel, Yoram Bachrach, Marc Lanctot, Vibhavari Dasagi, Luke Marris, Georgios Piliouras, Siqi Liu, Karl Tuyls

Abstract: Mathematical models of interactions among rational agents have long been studied in game theory. However these interactions are often over a small set of discrete game actions which is very different from how humans communicate in natural language. To bridge this gap, we introduce a framework that allows equilibrium solvers to work over the space of natural language dialogue generated by large lan… ▽ More Mathematical models of interactions among rational agents have long been studied in game theory. However these interactions are often over a small set of discrete game actions which is very different from how humans communicate in natural language. To bridge this gap, we introduce a framework that allows equilibrium solvers to work over the space of natural language dialogue generated by large language models (LLMs). Specifically, by modelling the players, strategies and payoffs in a "game" of dialogue, we create a binding from natural language interactions to the conventional symbolic logic of game theory. Given this binding, we can ask existing game-theoretic algorithms to provide us with strategic solutions (e.g., what string an LLM should generate to maximize payoff in the face of strategic partners or opponents), giving us predictors of stable, rational conversational strategies. We focus on three domains that require different negotiation strategies: scheduling meetings, trading fruit and debate, and evaluate an LLM's generated language when guided by solvers. We see that LLMs that follow game-theory solvers result in dialogue generations that are less exploitable than the control (no guidance from solvers), and the language generated results in higher rewards, in all negotiation domains. We discuss future implications of this work, and how game-theoretic solvers that can leverage the expressivity of natural language can open up a new avenue of guiding language research. △ Less

Submitted 16 December, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

Comments: Code available @ https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/games/chat_game.py

arXiv:2401.05133 [pdf, other]

Neural Population Learning beyond Symmetric Zero-sum Games

Authors: Siqi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Corre… ▽ More We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives. △ Less

Submitted 10 January, 2024; originally announced January 2024.

arXiv:2312.16609 [pdf, other]

Exploiting hidden structures in non-convex games for convergence to Nash equilibrium

Authors: Iosif Sakos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras

Abstract: A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence… ▽ More A wide array of modern machine learning applications - from adversarial models to multi-agent reinforcement learning - can be formulated as non-cooperative games whose Nash equilibria represent the system's desired operational states. Despite having a highly non-convex loss landscape, many cases of interest possess a latent convex structure that could potentially be leveraged to yield convergence to equilibrium. Driven by this observation, our paper proposes a flexible first-order method that successfully exploits such "hidden structures" and achieves convergence under minimal assumptions for the transformation connecting the players' control variables to the game's latent, convex-structured layer. The proposed method - which we call preconditioned hidden gradient descent (PHGD) - hinges on a judiciously chosen gradient preconditioning scheme related to natural gradient methods. Importantly, we make no separability assumptions for the game's hidden structure, and we provide explicit convergence rate guarantees for both deterministic and stochastic environments. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 32 pages, 18 figures

MSC Class: Primary 91A10; 91A26; secondary 68Q32

arXiv:2311.14125 [pdf, other]

Scalable AI Safety via Doubly-Efficient Debate

Authors: Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras

Abstract: The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of iden… ▽ More The emergence of pre-trained AI systems with powerful capabilities across a diverse and ever-increasing set of complex domains has raised a critical challenge for AI safety as tasks can become too complicated for humans to judge directly. Irving et al. [2018] proposed a debate method in this direction with the goal of pitting the power of such AI models against each other until the problem of identifying (mis)-alignment is broken down into a manageable subtask. While the promise of this approach is clear, the original framework was based on the assumption that the honest strategy is able to simulate deterministic AI systems for an exponential number of steps, limiting its applicability. In this paper, we show how to address these challenges by designing a new set of debate protocols where the honest strategy can always succeed using a simulation of a polynomial number of steps, whilst being able to verify the alignment of stochastic AI systems, even when the dishonest strategy is allowed to use exponentially many simulation steps. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.10859 [pdf, other]

doi 10.22331/q-2025-05-06-1737

A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games

Authors: Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras, Michael I. Jordan

Abstract: Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed th… ▽ More Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/ε^2)$ iterations to $ε$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/ε)$ iterations to $ε$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $ε$-Nash equilibria in quantum zero-sum games. △ Less

Submitted 2 May, 2025; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 53 pages, 7 figures, QTML 2023 (Long Talk), Quantum Journal 2025

MSC Class: primary 91A05; 81Q93; secondary 68Q32; 91A26; 37N40;

Journal ref: Quantum 9, 1737 (2025)

arXiv:2310.08473 [pdf, other]

doi 10.22331/q-2024-12-17-1569

No-Regret Learning and Equilibrium Computation in Quantum Games

Authors: Wayne Lin, Georgios Piliouras, Ryann Sim, Antonios Varvitsiotis

Abstract: As quantum processors advance, the emergence of large-scale decentralized systems involving interacting quantum-enabled agents is on the horizon. Recent research efforts have explored quantum versions of Nash and correlated equilibria as solution concepts of strategic quantum interactions, but these approaches did not directly connect to decentralized adaptive setups where agents possess limited i… ▽ More As quantum processors advance, the emergence of large-scale decentralized systems involving interacting quantum-enabled agents is on the horizon. Recent research efforts have explored quantum versions of Nash and correlated equilibria as solution concepts of strategic quantum interactions, but these approaches did not directly connect to decentralized adaptive setups where agents possess limited information. This paper delves into the dynamics of quantum-enabled agents within decentralized systems that employ no-regret algorithms to update their behaviors over time. Specifically, we investigate two-player quantum zero-sum games and polymatrix quantum zero-sum games, showing that no-regret algorithms converge to separable quantum Nash equilibria in time-average. In the case of general multi-player quantum games, our work leads to a novel solution concept, that of the {separable} quantum coarse correlated equilibria (QCCE), as the convergent outcome of the time-averaged behavior no-regret algorithms, offering a natural solution concept for decentralized quantum systems. Finally, we show that computing QCCEs can be formulated as a semidefinite program and establish the existence of entangled (i.e., non-separable) QCCEs, which are unlearnable via the current paradigm of no-regret learning. △ Less

Submitted 1 December, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Journal ref: Quantum 8, 1569 (2024)

arXiv:2310.06689 [pdf, other]

Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

Authors: Ian Gemp, Luke Marris, Georgios Piliouras

Abstract: We propose the first loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that… ▽ More We propose the first loss function for approximate Nash equilibria of normal-form games that is amenable to unbiased Monte Carlo estimation. This construction allows us to deploy standard non-convex stochastic optimization techniques for approximating Nash equilibria, resulting in novel algorithms with provable guarantees. We complement our theoretical analysis with experiments demonstrating that stochastic gradient descent can outperform previous state-of-the-art approaches. △ Less

Submitted 15 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Published at ICLR 2024

arXiv:2307.13928 [pdf, other]

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Authors: Aamal Hussain, Francesco Belardinelli, Georgios Piliouras

Abstract: The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum… ▽ More The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Presented at IJCAI 2023

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2307.13922 [pdf, other]

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Authors: Aamal Hussain, Dan Leonte, Francesco Belardinelli, Georgios Piliouras

Abstract: The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the… ▽ More The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Presented at the Workshop on New Frontiers in Learning, Control, and Dynamical Systems at the International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 2023

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2307.06640 [pdf, other]

Data-Scarce Identification of Game Dynamics via Sum-of-Squares Optimization

Authors: Iosif Sakos, Antonios Varvitsiotis, Georgios Piliouras

Abstract: Understanding how players adjust their strategies in games, based on their experience, is a crucial tool for policymakers. It enables them to forecast the system's eventual behavior, exert control over the system, and evaluate counterfactual scenarios. The task becomes increasingly difficult when only a limited number of observations are available or difficult to acquire. In this work, we introduc… ▽ More Understanding how players adjust their strategies in games, based on their experience, is a crucial tool for policymakers. It enables them to forecast the system's eventual behavior, exert control over the system, and evaluate counterfactual scenarios. The task becomes increasingly difficult when only a limited number of observations are available or difficult to acquire. In this work, we introduce the Side-Information Assisted Regression (SIAR) framework, designed to identify game dynamics in multiplayer normal-form games only using data from a short run of a single system trajectory. To enhance system recovery in the face of scarce data, we integrate side-information constraints into SIAR, which restrict the set of feasible solutions to those satisfying game-theoretic properties and common assumptions about strategic interactions. SIAR is solved using sum-of-squares (SOS) optimization, resulting in a hierarchy of approximations that provably converge to the true dynamics of the system. We showcase that the SIAR framework accurately predicts player behavior across a spectrum of normal-form games, widely-known families of game dynamics, and strong benchmarks, even if the unknown system is chaotic. △ Less

Submitted 11 October, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.03136 [pdf, other]

Multiplicative Updates for Online Convex Optimization over Symmetric Cones

Authors: Ilayda Canyakmaz, Wayne Lin, Georgios Piliouras, Antonios Varvitsiotis

Abstract: We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan A… ▽ More We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 27 pages, 7 figures, 2 tables

arXiv:2306.01032 [pdf, other]

Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Authors: Emmanouil-Vasileios Vlatakis-Gkaragkounis, Lampros Flokas, Georgios Piliouras

Abstract: Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved… ▽ More Multi-agent learning is intrinsically harder, more unstable and unpredictable than single agent optimization. For this reason, numerous specialized heuristics and techniques have been designed towards the goal of achieving convergence to equilibria in self-play. One such celebrated approach is the use of dynamically adaptive learning rates. Although such techniques are known to allow for improved convergence guarantees in small games, it has been much harder to analyze them in more relevant settings with large populations of agents. These settings are particularly hard as recent work has established that learning with fixed rates will become chaotic given large enough populations.In this work, we show that chaos persists in large population congestion games despite using adaptive learning rates even for the ubiquitous Multiplicative Weight Updates algorithm, even in the presence of only two strategies. At a technical level, due to the non-autonomous nature of the system, our approach goes beyond conventional period-three techniques Li-Yorke by studying fundamental properties of the dynamics including invariant sets, volume expansion and turbulent sets. We complement our theoretical insights with experiments showcasing that slight variations to system parameters lead to a wide variety of unpredictable behaviors. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 30 pages, 6 figures

arXiv:2304.09978 [pdf, other]

Equilibrium-Invariant Embedding, Metric Space, and Fundamental Set of $2\times2$ Normal-Form Games

Authors: Luke Marris, Ian Gemp, Georgios Piliouras

Abstract: Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equil… ▽ More Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study $2\times2$ games. The equilibrium-invariant embedding of $2\times2$ games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of $2\times2$ games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in $2\times2$ games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for $2\times2$ games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 42 pages

arXiv:2302.06969 [pdf, other]

doi 10.1016/j.physd.2023.133940

A stochastic variant of replicator dynamics in zero-sum games and its invariant measures

Authors: Maximilian Engel, Georgios Piliouras

Abstract: We study the behavior of a stochastic variant of replicator dynamics in two-agent zero-sum games. We characterize the statistics of such systems by their invariant measures which can be shown to be entirely supported on the boundary of the space of mixed strategies. Depending on the noise strength we can furthermore characterize these invariant measures by finding accumulation of mass at specific… ▽ More We study the behavior of a stochastic variant of replicator dynamics in two-agent zero-sum games. We characterize the statistics of such systems by their invariant measures which can be shown to be entirely supported on the boundary of the space of mixed strategies. Depending on the noise strength we can furthermore characterize these invariant measures by finding accumulation of mass at specific parts of the boundary. In particular, regardless of the magnitude of noise, we show that any invariant probability measure is a convex combination of Dirac measures on pure strategy profiles, which correspond to vertices/corners of the agents' simplices. Thus, in the presence of stochastic perturbations, even in the most classic zero-sum settings, such as Matching Pennies, we observe a stark disagreement between the axiomatic prediction of Nash equilibrium and the evolutionary emergent behavior derived by an assumption of stochastically adaptive, learning agents. △ Less

Submitted 25 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

MSC Class: 60H10; 60J70; 91A15; 91A22; 91A25; 91A68

arXiv:2302.06607 [pdf, other]

Generative Adversarial Equilibrium Solvers

Authors: Denizalp Goktas, David C. Parkes, Ian Gemp, Luke Marris, Georgios Piliouras, Romuald Elie, Guy Lever, Andrea Tacchetti

Abstract: We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other playe… ▽ More We introduce the use of generative adversarial learning to compute equilibria in general game-theoretic settings, specifically the generalized Nash equilibrium (GNE) in pseudo-games, and its specific instantiation as the competitive equilibrium (CE) in Arrow-Debreu competitive economies. Pseudo-games are a generalization of games in which players' actions affect not only the payoffs of other players but also their feasible action spaces. Although the computation of GNE and CE is intractable in the worst-case, i.e., PPAD-hard, in practice, many applications only require solutions with high accuracy in expectation over a distribution of problem instances. We introduce Generative Adversarial Equilibrium Solvers (GAES): a family of generative adversarial neural networks that can learn GNE and CE from only a sample of problem instances. We provide computational and sample complexity bounds, and apply the framework to finding Nash equilibria in normal-form games, CE in Arrow-Debreu competitive economies, and GNE in an environmental economic model of the Kyoto mechanism. △ Less

Submitted 20 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 41 pages, 13 figures

arXiv:2302.04789 [pdf, other]

doi 10.22331/q-2025-04-03-1689

Learning in Quantum Common-Interest Games and the Separability Problem

Authors: Wayne Lin, Georgios Piliouras, Ryann Sim, Antonios Varvitsiotis

Abstract: Learning in games has emerged as a powerful tool for machine learning with numerous applications. Quantum games model interactions between strategic players who have access to quantum resources, and several recent works have studied {learning in} the competitive regime of quantum zero-sum games. Going beyond this setting, we introduce quantum common-interest games (CIGs) where players have density… ▽ More Learning in games has emerged as a powerful tool for machine learning with numerous applications. Quantum games model interactions between strategic players who have access to quantum resources, and several recent works have studied {learning in} the competitive regime of quantum zero-sum games. Going beyond this setting, we introduce quantum common-interest games (CIGs) where players have density matrices as strategies and their interests are perfectly aligned. We bridge the gap between optimization and game theory by establishing the equivalence between KKT (first-order stationary) points of an instance of the Best Separable State (BSS) problem and the Nash equilibria of its corresponding quantum CIG. This allows learning dynamics for the quantum CIG to be seen as decentralized algorithms for the BSS problem. Taking the perspective of learning in games, we then introduce non-commutative extensions of the continuous-time replicator dynamics and the discrete-time best response dynamics/linear multiplicative weights update for learning in quantum CIGs. We prove analogues of classical convergence results of the dynamics and explore differences which arise in the quantum setting. Finally, we corroborate our theoretical findings through extensive experiments. △ Less

Submitted 30 March, 2025; v1 submitted 9 February, 2023; originally announced February 2023.

Journal ref: Quantum 9, 1689 (2025)

arXiv:2301.09619 [pdf, other]

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Authors: Aamal Abbas Hussain, Francesco Belardinelli, Georgios Piliouras

Abstract: Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q… ▽ More Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games. Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: Accepted in AAMAS 2023

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2301.04929 [pdf, other]

Heterogeneous Beliefs and Multi-Population Learning in Network Games

Authors: Shuyue Hu, Harold Soh, Georgios Piliouras

Abstract: The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP).We show that the system state -- a probability distribution over… ▽ More The effect of population heterogeneity in multi-agent learning is practically relevant but remains far from being well-understood. Motivated by this, we introduce a model of multi-population learning that allows for heterogeneous beliefs within each population and where agents respond to their beliefs via smooth fictitious play (SFP).We show that the system state -- a probability distribution over beliefs -- evolves according to a system of partial differential equations akin to the continuity equations that commonly desccribe transport phenomena in physical systems. We establish the convergence of SFP to Quantal Response Equilibria in different classes of games capturing both network competition as well as network coordination. We also prove that the beliefs will eventually homogenize in all network games. Although the initial belief heterogeneity disappears in the limit, we show that it plays a crucial role for equilibrium selection in the case of coordination games as it helps select highly desirable equilibria. Contrary, in the case of network competition, the resulting limit behavior is independent of the initialization of beliefs, even when the underlying game has many distinct Nash equilibria. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2301.03931 [pdf, ps, other]

Min-Max Optimization Made Simple: Approximating the Proximal Point Method via Contraction Maps

Authors: Volkan Cevher, Georgios Piliouras, Ryann Sim, Stratis Skoulakis

Abstract: In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accur… ▽ More In this paper we present a first-order method that admits near-optimal convergence rates for convex/concave min-max problems while requiring a simple and intuitive analysis. Similarly to the seminal work of Nemirovski and the recent approach of Piliouras et al. in normal form games, our work is based on the fact that the update rule of the Proximal Point method (PP) can be approximated up to accuracy $ε$ with only $O(\log 1/ε)$ additional gradient-calls through the iterations of a contraction map. Then combining the analysis of (PP) method with an error-propagation analysis we establish that the resulting first order method, called Clairvoyant Extra Gradient, admits near-optimal time-average convergence for general domains and last-iterate convergence in the unconstrained case. △ Less

Submitted 16 January, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: To appear in SOSA23

arXiv:2212.07175 [pdf, other]

Optimality Despite Chaos in Fee Markets

Authors: Stefanos Leonardos, Daniël Reijsbergen, Barnabé Monnot, Georgios Piliouras

Abstract: Transaction fee markets are essential components of blockchain economies, as they resolve the inherent scarcity in the number of transactions that can be added to each block. In early blockchain protocols, this scarcity was resolved through a first-price auction in which users were forced to guess appropriate bids from recent blockchain data. Ethereum's EIP-1559 fee market reform streamlines this… ▽ More Transaction fee markets are essential components of blockchain economies, as they resolve the inherent scarcity in the number of transactions that can be added to each block. In early blockchain protocols, this scarcity was resolved through a first-price auction in which users were forced to guess appropriate bids from recent blockchain data. Ethereum's EIP-1559 fee market reform streamlines this process through the use of a base fee that is increased (or decreased) whenever a block exceeds (or fails to meet) a specified target block size. Previous work has found that the EIP-1559 mechanism may lead to a base fee process that is inherently chaotic, in which case the base fee does not converge to a fixed point even under ideal conditions. However, the impact of this chaotic behavior on the fee market's main design goal -- blocks whose long-term average size equals the target -- has not previously been explored. As our main contribution, we derive near-optimal upper and lower bounds for the time-average block size in the EIP-1559 mechanism despite its possibly chaotic evolution. Our lower bound is equal to the target utilization level whereas our upper bound is approximately 6% higher than optimal. Empirical evidence is shown in great agreement with these theoretical predictions. Specifically, the historical average was approximately 2.9% larger than the target rage under Proof-of-Work and decreased to approximately 2.0% after Ethereum's transition to Proof-of-Stake. We also find that an approximate version of EIP-1559 achieves optimality even in the absence of convergence. △ Less

Submitted 15 December, 2022; v1 submitted 14 December, 2022; originally announced December 2022.

MSC Class: 91A80; 91-10; 91B26

arXiv:2211.01681 [pdf, other]

Matrix Multiplicative Weights Updates in Quantum Zero-Sum Games: Conservation Laws & Recurrence

Authors: Rahul Jain, Georgios Piliouras, Ryann Sim

Abstract: Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update… ▽ More Recent advances in quantum computing and in particular, the introduction of quantum GANs, have led to increased interest in quantum zero-sum game theory, extending the scope of learning algorithms for classical games into the quantum realm. In this paper, we focus on learning in quantum zero-sum games under Matrix Multiplicative Weights Update (a generalization of the multiplicative weights update method) and its continuous analogue, Quantum Replicator Dynamics. When each player selects their state according to quantum replicator dynamics, we show that the system exhibits conservation laws in a quantum-information theoretic sense. Moreover, we show that the system exhibits Poincare recurrence, meaning that almost all orbits return arbitrarily close to their initial conditions infinitely often. Our analysis generalizes previous results in the case of classical games. △ Less

Submitted 26 April, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022

arXiv:2208.10138 [pdf, other]

Learning Correlated Equilibria in Mean-Field Games

Authors: Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls

Abstract: The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-… ▽ More The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games. △ Less

Submitted 22 August, 2022; originally announced August 2022.

arXiv:2207.08426 [pdf, other]

Fast Convergence of Optimistic Gradient Ascent in Network Zero-Sum Extensive Form Games

Authors: Georgios Piliouras, Lillian Ratliff, Ryann Sim, Stratis Skoulakis

Abstract: The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum propert… ▽ More The study of learning in games has thus far focused primarily on normal form games. In contrast, our understanding of learning in extensive form games (EFGs) and particularly in EFGs with many agents lags far behind, despite them being closer in nature to many real world applications. We consider the natural class of Network Zero-Sum Extensive Form Games, which combines the global zero-sum property of agent payoffs, the efficient representation of graphical games as well the expressive power of EFGs. We examine the convergence properties of Optimistic Gradient Ascent (OGA) in these games. We prove that the time-average behavior of such online learning dynamics exhibits $O(1/T)$ rate convergence to the set of Nash Equilibria. Moreover, we show that the day-to-day behavior also converges to Nash with rate $O(c^{-t})$ for some game-dependent constant $c>0$. △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: To appear in SAGT 2022

arXiv:2206.04160 [pdf, other]

Alternating Mirror Descent for Constrained Min-Max Games

Authors: Andre Wibisono, Molei Tao, Georgios Piliouras

Abstract: In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrai… ▽ More In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020). △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2203.14129 [pdf, other]

Nash, Conley, and Computation: Impossibility and Incompleteness in Game Dynamics

Authors: Jason Milionis, Christos Papadimitriou, Georgios Piliouras, Kelly Spendlove

Abstract: Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players' behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular the concepts of chain recurrence, attra… ▽ More Under what conditions do the behaviors of players, who play a game repeatedly, converge to a Nash equilibrium? If one assumes that the players' behavior is a discrete-time or continuous-time rule whereby the current mixed strategy profile is mapped to the next, this becomes a problem in the theory of dynamical systems. We apply this theory, and in particular the concepts of chain recurrence, attractors, and Conley index, to prove a general impossibility result: there exist games for which any dynamics is bound to have starting points that do not end up at a Nash equilibrium. We also prove a stronger result for $ε$-approximate Nash equilibria: there are games such that no game dynamics can converge (in an appropriate sense) to $ε$-Nash equilibria, and in fact the set of such games has positive measure. Further numerical results demonstrate that this holds for any $ε$ between zero and $0.09$. Our results establish that, although the notions of Nash equilibria (and its computation-inspired approximations) are universally applicable in all games, they are also fundamentally incomplete as predictors of long term behavior, regardless of the choice of dynamics. △ Less

Submitted 26 March, 2022; originally announced March 2022.

Comments: 25 pages

Showing 1–50 of 127 results for author: Piliouras, G