Skip to main content

Showing 1–50 of 64 results for author: Lanctot, M

.
  1. arXiv:2502.20170  [pdf, other

    cs.GT cs.CL cs.LG stat.ML

    Re-evaluating Open-ended Evaluation of Large Language Models

    Authors: Siqi Liu, Ian Gemp, Luke Marris, Georgios Piliouras, Nicolas Heess, Marc Lanctot

    Abstract: Evaluation has traditionally focused on ranking candidates for a specific skill. Modern generalist models, such as Large Language Models (LLMs), decidedly outpace this paradigm. Open-ended evaluation systems, where candidate models are compared on user-submitted prompts, have emerged as a popular solution. Despite their many advantages, we show that the current Elo-based rating systems can be susc… ▽ More

    Submitted 8 May, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: Published at ICLR 2025

  2. arXiv:2502.11645  [pdf, other

    cs.GT cs.CL cs.MA stat.OT

    Deviation Ratings: A General, Clone-Invariant Rating Method

    Authors: Luke Marris, Siqi Liu, Ian Gemp, Georgios Piliouras, Marc Lanctot

    Abstract: Many real-world multi-agent or multi-task evaluation scenarios can be naturally modelled as normal-form games due to inherent strategic (adversarial, cooperative, and mixed motive) interactions. These strategic interactions may be agentic (e.g. players trying to win), fundamental (e.g. cost vs quality), or complementary (e.g. niche finding and specialization). In such a formulation, it is the stra… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  3. arXiv:2501.19266  [pdf, other

    cs.AI cs.LG econ.TH

    Jackpot! Alignment as a Maximal Lottery

    Authors: Roberto-Rafael Maura-Rivero, Marc Lanctot, Francesco Visin, Kate Larson

    Abstract: Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large Language Models (LLMs) with human values, is known to fail to satisfy properties that are intuitively desirable, such as respecting the preferences of the majority \cite{ge2024axioms}. To overcome these issues, we propose the use of a probabilistic Social Choice rule called \emph{maximal lotteries} as a replacement… ▽ More

    Submitted 31 January, 2025; originally announced January 2025.

  4. arXiv:2412.12119  [pdf, other

    cs.AI cs.CL cs.LG

    Mastering Board Games by External and Internal Planning with Language Models

    Authors: John Schultz, Jakub Adamek, Matej Jusup, Marc Lanctot, Michael Kaisers, Sarah Perrin, Daniel Hennes, Jeremy Shar, Cannada Lewis, Anian Ruoss, Tom Zahavy, Petar Veličković, Laurel Prince, Satinder Singh, Eric Malmi, Nenad Tomašev

    Abstract: Advancing planning and reasoning capabilities of Large Language Models (LLMs) is one of the key prerequisites towards unlocking their potential for performing reliably in complex and impactful domains. In this paper, we aim to demonstrate this across board games (Chess, Fischer Random / Chess960, Connect Four, and Hex), and we show that search-based planning can yield significant improvements in L… ▽ More

    Submitted 22 May, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Comments: 70 pages, 10 figures

  5. arXiv:2411.00119  [pdf, other

    cs.MA cs.LG

    Soft Condorcet Optimization for Ranking of General Agents

    Authors: Marc Lanctot, Kate Larson, Michael Kaisers, Quentin Berthet, Ian Gemp, Manfred Diaz, Roberto-Rafael Maura-Rivero, Yoram Bachrach, Anna Koop, Doina Precup

    Abstract: Driving progress of AI models and agents requires comparing their performance on standardized benchmarks; for general agents, individual performances must be aggregated across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the o… ▽ More

    Submitted 20 February, 2025; v1 submitted 31 October, 2024; originally announced November 2024.

    Journal ref: AAMAS 2025

  6. arXiv:2409.03875  [pdf, other

    cs.GT

    Learning in Games with Progressive Hiding

    Authors: Benjamin Heymann, Marc Lanctot

    Abstract: When learning to play an imperfect information game, it is often easier to first start with the basic mechanics of the game rules. For example, one can play several example rounds with private cards revealed to all players to better understand the basic actions and their effects. Building on this intuition, this paper introduces {\it progressive hiding}, an algorithm that balances learning the bas… ▽ More

    Submitted 26 May, 2025; v1 submitted 5 September, 2024; originally announced September 2024.

  7. arXiv:2402.11835  [pdf, other

    cs.LG cs.GT cs.MA

    Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

    Authors: Luca D'Amico-Wong, Hugh Zhang, Marc Lanctot, David C. Parkes

    Abstract: We propose ABCs (Adaptive Branching through Child stationarity), a best-of-both-worlds algorithm combining Boltzmann Q-learning (BQL), a classic reinforcement learning algorithm for single-agent domains, and counterfactual regret minimization (CFR), a central algorithm for learning in multi-agent domains. ABCs adaptively chooses what fraction of the environment to explore each iteration by measuri… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  8. arXiv:2402.03928  [pdf, other

    cs.GT cs.MA

    Approximating the Core via Iterative Coalition Sampling

    Authors: Ian Gemp, Marc Lanctot, Luke Marris, Yiran Mao, Edgar Duéñez-Guzmán, Sarah Perrin, Andras Gyorgy, Romuald Elie, Georgios Piliouras, Michael Kaisers, Daniel Hennes, Kalesha Bullard, Kate Larson, Yoram Bachrach

    Abstract: The core is a central solution concept in cooperative game theory, defined as the set of feasible allocations or payments such that no subset of agents has incentive to break away and form their own subgroup or coalition. However, it has long been known that the core (and approximations, such as the least-core) are hard to compute. This limits our ability to analyze cooperative games in general, a… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Published in AAMAS 2024

  9. arXiv:2402.01704  [pdf, other

    cs.CL cs.AI cs.GT

    Steering Language Models with Game-Theoretic Solvers

    Authors: Ian Gemp, Roma Patel, Yoram Bachrach, Marc Lanctot, Vibhavari Dasagi, Luke Marris, Georgios Piliouras, Siqi Liu, Karl Tuyls

    Abstract: Mathematical models of interactions among rational agents have long been studied in game theory. However these interactions are often over a small set of discrete game actions which is very different from how humans communicate in natural language. To bridge this gap, we introduce a framework that allows equilibrium solvers to work over the space of natural language dialogue generated by large lan… ▽ More

    Submitted 16 December, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

    Comments: Code available @ https://github.com/google-deepmind/open_spiel/blob/master/open_spiel/python/games/chat_game.py

  10. arXiv:2401.05133  [pdf, other

    cs.AI cs.MA

    Neural Population Learning beyond Symmetric Zero-sum Games

    Authors: Siqi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

    Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Corre… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  11. arXiv:2312.03121  [pdf, other

    cs.AI cs.GT cs.MA

    Evaluating Agents using Social Choice Theory

    Authors: Marc Lanctot, Kate Larson, Yoram Bachrach, Luke Marris, Zun Li, Avishkar Bhoopchand, Thomas Anthony, Brian Tanner, Anna Koop

    Abstract: We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evalua… ▽ More

    Submitted 20 January, 2025; v1 submitted 5 December, 2023; originally announced December 2023.

  12. arXiv:2303.03196  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

    Authors: Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

    Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 25 pages, 8 figures, Accepted at TMLR October 2023

  13. arXiv:2303.01074  [pdf, other

    cs.GT cs.LG

    Learning not to Regret

    Authors: David Sychrovský, Michal Šustr, Elnaz Davoodi, Michael Bowling, Marc Lanctot, Martin Schmid

    Abstract: The literature on game-theoretic equilibrium finding predominantly focuses on single games or their repeated play. Nevertheless, numerous real-world scenarios feature playing a game sampled from a distribution of similar, but not identical games, such as playing poker with different public cards or trading correlated assets on the stock market. As these similar games feature similar equilibra, we… ▽ More

    Submitted 19 February, 2024; v1 submitted 2 March, 2023; originally announced March 2023.

  14. arXiv:2302.00797  [pdf, other

    cs.AI cs.GT cs.LG cs.MA

    Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

    Authors: Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

    Abstract: Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampli… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  15. arXiv:2210.02205  [pdf, other

    cs.GT cs.LG cs.MA

    Game Theoretic Rating in N-player general-sum games with Equilibria

    Authors: Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen McAleer, Jerome Connor, Karl Tuyls, Thore Graepel

    Abstract: Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting. Traditionally, only transitive dependencies between strategies have been used to rate strategies (e.g. Elo), however recent work has expanded ratings to utilize game theoretic solutions to better rate strategies in non-tra… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  16. arXiv:2209.10958  [pdf, ps, other

    cs.MA cs.AI

    Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    Authors: Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, Siqi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov , et al. (2 additional authors not shown)

    Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Published in AI Communications 2022

  17. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  18. arXiv:2206.05825  [pdf, other

    cs.LG cs.AI cs.GT

    A Unified Approach to Reinforcement Learning, Quantal Response Equilibria, and Two-Player Zero-Sum Games

    Authors: Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, Nicolas Loizou, Marc Lanctot, Ioannis Mitliagkas, Noam Brown, Christian Kroer

    Abstract: This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equili… ▽ More

    Submitted 11 April, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

  19. arXiv:2206.04122  [pdf, other

    cs.GT cs.AI cs.LG stat.ML

    ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

    Authors: Stephen McAleer, Gabriele Farina, Marc Lanctot, Tuomas Sandholm

    Abstract: Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies). One promising line of research uses neural networks to approximate counterfactual regret minimization (CFR) or its modern variants. DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains… ▽ More

    Submitted 11 October, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

  20. arXiv:2205.15879  [pdf, other

    cs.AI cs.GT cs.LG

    Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games

    Authors: Siqi Liu, Marc Lanctot, Luke Marris, Nicolas Heess

    Abstract: Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture o… ▽ More

    Submitted 23 December, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning (ICML 2022)

  21. arXiv:2205.12031   

    cs.GT cs.AI

    Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

    Authors: Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy R. Greenwald

    Abstract: Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of… ▽ More

    Submitted 1 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Please see version 4 of arXiv:2102.06973 (arXiv:2102.06973v4). This submission was a version of that paper with highlighted corrections. After submitting, I figured out that it would be better to submit this report as another version of arXiv:2102.06973

  22. arXiv:2201.07700  [pdf, other

    cs.GT cs.LG cs.MA

    Anytime PSRO for Two-Player Zero-Sum Games

    Authors: Stephen McAleer, Kevin Wang, John Lanier, Marc Lanctot, Pierre Baldi, Tuomas Sandholm, Roy Fox

    Abstract: Policy space response oracles (PSRO) is a multi-agent reinforcement learning algorithm that has achieved state-of-the-art performance in very large two-player zero-sum games. PSRO is based on the tabular double oracle (DO) method, an algorithm that is guaranteed to converge to a Nash equilibrium, but may increase exploitability from one iteration to the next. We propose anytime double oracle (ADO)… ▽ More

    Submitted 28 January, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: Published in AAAI Reinforcement Learning in Games Workshop

  23. arXiv:2112.03178  [pdf, other

    cs.AI cs.GT cs.LG

    Student of Games: A unified learning algorithm for both perfect and imperfect information games

    Authors: Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling

    Abstract: Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies p… ▽ More

    Submitted 15 November, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Published in Science Advances

    Journal ref: Science Advances 9, eadg3256 (2023)

  24. arXiv:2110.14241  [pdf, other

    cs.LG cs.AI cs.CL cs.MA stat.ML

    Dynamic population-based meta-learning for multi-agent communication with natural language

    Authors: Abhinav Gupta, Marc Lanctot, Angeliki Lazaridou

    Abstract: In this work, our goal is to train agents that can coordinate with seen, unseen as well as human partners in a multi-agent communication environment involving natural language. Previous work using a single set of agents has shown great progress in generalizing to known partners, however it struggles when coordinating with unfamiliar agents. To mitigate that, recent work explored the use of populat… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  25. arXiv:2106.09435  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

    Authors: Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel

    Abstract: Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: ICML 2021, 9 pages, coded implementation available in https://github.com/deepmind/open_spiel/ (jpsro.py in examples)

  26. arXiv:2106.01285  [pdf, other

    cs.GT

    Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

    Authors: Ian Gemp, Rahul Savani, Marc Lanctot, Yoram Bachrach, Thomas Anthony, Richard Everett, Andrea Tacchetti, Tom Eccles, János Kramár

    Abstract: Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously establishe… ▽ More

    Submitted 4 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Published in AAMAS 2022 (code available as part of open_spiel on github -- search ADIDAS in repo)

  27. arXiv:2102.06973  [pdf, other

    cs.GT cs.AI

    Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

    Authors: Dustin Morrill, Ryan D'Orazio, Marc Lanctot, James R. Wright, Michael Bowling, Amy Greenwald

    Abstract: Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of… ▽ More

    Submitted 22 June, 2022; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: Corrected technical report for the paper with the same title in the proceedings of the thirty-eighth International Conference on Machine Learning (ICML 2021), virtual. Compared to v5, this version removes the version indicator from an arXiv reference. 43 pages and 6 figures

  28. arXiv:2101.04237  [pdf, other

    cs.AI cs.LG

    Solving Common-Payoff Games with Approximate Policy Iteration

    Authors: Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

    Abstract: For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- h… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: AAAI 2021

  29. arXiv:2012.05874  [pdf, other

    cs.GT cs.AI

    Hindsight and Sequential Rationality of Correlated Play

    Authors: Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R. Wright, Amy Greenwald, Michael Bowling

    Abstract: Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to c… ▽ More

    Submitted 22 June, 2022; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: Corrected technical report for the paper with the same title in the proceedings of the thirty-fifth AAAI Conference on Artificial Intelligence (AAAI-21), February 2-9, 2021, Virtual. Compared to v5, this version fixes the realized terminal history indicators in the diagram describing MacQueen's counterexample. 27 pages and 16 figures

  30. arXiv:2010.10380  [pdf, other

    cs.LG cs.AI cs.MA

    Negotiating Team Formation Using Deep Reinforcement Learning

    Authors: Yoram Bachrach, Richard Everett, Edward Hughes, Angeliki Lazaridou, Joel Z. Leibo, Marc Lanctot, Michael Johanson, Wojciech M. Czarnecki, Thore Graepel

    Abstract: When autonomous agents interact in the same environment, they must often cooperate to achieve their goals. One way for agents to cooperate effectively is to form a team, make a binding agreement on a joint plan, and execute it. However, when agents are self-interested, the gains from team formation must be allocated appropriately to incentivize agreement. Various approaches for multi-agent negotia… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    ACM Class: I.2.6

    Journal ref: Artificial Intelligence 288 (2020): 103356

  31. arXiv:2008.12234  [pdf, other

    cs.AI cs.LG

    The Advantage Regret-Matching Actor-Critic

    Authors: Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls

    Abstract: Regret minimization has played a key role in online learning, equilibrium computation in games, and reinforcement learning (RL). In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior. We propose a model-free RL algorithm, the AdvantageRegret-Matching Actor-Critic (ARMAC): rather than saving past state-action data, ARMAC… ▽ More

    Submitted 27 August, 2020; originally announced August 2020.

  32. arXiv:2006.08740  [pdf, other

    cs.GT

    Sound Algorithms in Imperfect Information Games

    Authors: Michal Šustr, Martin Schmid, Matej Moravčík, Neil Burch, Marc Lanctot, Michael Bowling

    Abstract: Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two… ▽ More

    Submitted 2 March, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted to AAMAS2021 as extended abstract (Ref. numbers not available yet)

  33. arXiv:2006.04635  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Learning to Play No-Press Diplomacy with Best Response Policy Iteration

    Authors: Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach

    Abstract: Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects.… ▽ More

    Submitted 4 January, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

  34. arXiv:2004.09677  [pdf, other

    cs.LG stat.ML

    Approximate exploitability: Learning a best response in large games

    Authors: Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling

    Abstract: Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically… ▽ More

    Submitted 3 November, 2022; v1 submitted 20 April, 2020; originally announced April 2020.

  35. arXiv:2002.08456  [pdf, other

    cs.GT cs.LG stat.ML

    From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

    Authors: Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

    Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincaré recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergen… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 43 pages

  36. arXiv:1909.12823  [pdf, other

    cs.MA cs.AI cs.LG

    A Generalized Training Approach for Multiagent Learning

    Authors: Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Siqi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos

    Abstract: This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO). PSRO is general in the sense that it (1) encompasses well-known algorithms such as fictitious play and double oracle as special cases, and (2) in principle applies to general-sum, many-player games. Despite this, prior studies of PSRO have been focused on two-… ▽ More

    Submitted 14 February, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

  37. arXiv:1908.09453  [pdf, other

    cs.LG cs.AI cs.GT cs.MA

    OpenSpiel: A Framework for Reinforcement Learning in Games

    Authors: Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes , et al. (2 additional authors not shown)

    Abstract: OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partia… ▽ More

    Submitted 26 September, 2020; v1 submitted 25 August, 2019; originally announced August 2019.

  38. arXiv:1906.00190  [pdf, other

    cs.LG cs.AI stat.ML

    Neural Replicator Dynamics

    Authors: Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls

    Abstract: Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstati… ▽ More

    Submitted 26 February, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

  39. arXiv:1903.05614  [pdf, other

    cs.AI cs.GT cs.LG

    Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent

    Authors: Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls

    Abstract: In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents. We prove that when following this optimization, the exploitability of a player's strategy converges asymptotically to zero, and hence when both players employ this opti… ▽ More

    Submitted 12 June, 2020; v1 submitted 13 March, 2019; originally announced March 2019.

    Comments: IJCAI 2019, 11 pages, 1 figure

  40. arXiv:1903.01373  [pdf, other

    cs.MA cs.GT

    $α$-Rank: Multi-Agent Evaluation by Evolution

    Authors: Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos

    Abstract: We introduce $α$-Rank, a principled evolutionary dynamics methodology for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs). The approach leverages continuous- and discrete-time evolutionary dynamical systems applied to empirical games, and scales tractably in the number of… ▽ More

    Submitted 4 October, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

  41. arXiv:1903.00742  [pdf, other

    cs.AI cs.GT cs.MA cs.NE q-bio.NC

    Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

    Authors: Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel

    Abstract: Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally eme… ▽ More

    Submitted 11 March, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

    Comments: 16 pages, 2 figures

  42. The Hanabi Challenge: A New Frontier for AI Research

    Authors: Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

    Abstract: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains… ▽ More

    Submitted 6 December, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 32 pages, 5 figures, In Press (Artificial Intelligence)

  43. arXiv:1810.09026  [pdf, other

    cs.LG cs.AI cs.GT cs.MA stat.ML

    Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

    Authors: Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling

    Abstract: Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments.… ▽ More

    Submitted 12 June, 2020; v1 submitted 21 October, 2018; originally announced October 2018.

    Comments: NeurIPS 2018

  44. arXiv:1809.03057  [pdf, other

    cs.GT cs.AI

    Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

    Authors: Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

    Abstract: Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, p… ▽ More

    Submitted 9 September, 2018; originally announced September 2018.

  45. arXiv:1804.03980  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Emergent Communication through Negotiation

    Authors: Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z Leibo, Karl Tuyls, Stephen Clark

    Abstract: Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textit{a pri… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

    Comments: Published as a conference paper at ICLR 2018

  46. arXiv:1803.06376  [pdf, other

    cs.GT cs.MA

    A Generalised Method for Empirical Game Theoretic Analysis

    Authors: Karl Tuyls, Julien Perolat, Marc Lanctot, Joel Z Leibo, Thore Graepel

    Abstract: This paper provides theoretical bounds for empirical game theoretical analysis of complex multi-agent interactions. We provide insights in the empirical meta game showing that a Nash equilibrium of the meta-game is an approximate Nash equilibrium of the true underlying game. We investigate and show how many data samples are required to obtain a close enough approximation of the underlying game. Ad… ▽ More

    Submitted 16 March, 2018; originally announced March 2018.

    Comments: will appear at AAMAS'18

  47. Feedforward and feedback control of locked mode phase and rotation in DIII-D with application to modulated ECCD experiments

    Authors: Wilkie Choi, R. J. La Haye, M. J. Lanctot, K. E. J. Olofsson, E. J. Strait, R. Sweeney, F. A. Volpe

    Abstract: The toroidal phase and rotation of otherwise locked magnetic islands of toroidal mode number n=1 are controlled in the DIII-D tokamak by means of applied magnetic perturbations of n=1. Pre-emptive perturbations were applied in feedforward to "catch" the mode as it slowed down and entrain it to the rotating field before complete locking, thus avoiding the associated major confinement degradation. A… ▽ More

    Submitted 15 January, 2018; originally announced January 2018.

  48. arXiv:1712.01815  [pdf, other

    cs.AI cs.LG

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    Authors: David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

    Abstract: The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

  49. arXiv:1711.05074  [pdf, other

    cs.GT cs.MA

    Symmetric Decomposition of Asymmetric Games

    Authors: Karl Tuyls, Julien Perolat, Marc Lanctot, Georg Ostrovski, Rahul Savani, Joel Leibo, Toby Ord, Thore Graepel, Shane Legg

    Abstract: We introduce new theoretical insights into two-population asymmetric games allowing for an elegant symmetric decomposition into two single population symmetric games. Specifically, we show how an asymmetric bimatrix game (A,B) can be decomposed into its symmetric counterparts by envisioning and investigating the payoff tables (A and B) that constitute the asymmetric game, as two independent, singl… ▽ More

    Submitted 17 January, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: Paper is published in Scientific Reports; https://www.nature.com/articles/s41598-018-19194-4, 2018

  50. arXiv:1711.00832  [pdf, other

    cs.AI cs.GT cs.LG cs.MA

    A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning

    Authors: Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel

    Abstract: To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to t… ▽ More

    Submitted 7 November, 2017; v1 submitted 2 November, 2017; originally announced November 2017.

    Comments: Camera-ready copy of NIPS 2017 paper, including appendix