-
Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
Authors:
Antonio Ocello,
Daniil Tiapkin,
Lorenzo Mancini,
Mathieu Laurière,
Eric Moulines
Abstract:
We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optim…
▽ More
We introduce Mean-Field Trust Region Policy Optimization (MF-TRPO), a novel algorithm designed to compute approximate Nash equilibria for ergodic Mean-Field Games (MFG) in finite state-action spaces. Building on the well-established performance of TRPO in the reinforcement learning (RL) setting, we extend its methodology to the MFG framework, leveraging its stability and robustness in policy optimization. Under standard assumptions in the MFG literature, we provide a rigorous analysis of MF-TRPO, establishing theoretical guarantees on its convergence. Our results cover both the exact formulation of the algorithm and its sample-based counterpart, where we derive high-probability guarantees and finite sample complexity. This work advances MFG optimization by bridging RL techniques with mean-field decision-making, offering a theoretically grounded approach to solving complex multi-agent problems.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Learning to Stop: Deep Learning for Mean Field Optimal Stopping
Authors:
Lorenzo Magnino,
Yuchen Zhu,
Mathieu Laurière
Abstract:
Optimal stopping is a fundamental problem in optimization with applications in risk management, finance, robotics, and machine learning. We extend the standard framework to a multi-agent setting, named multi-agent optimal stopping (MAOS), where agents cooperate to make optimal stopping decisions in a finite-space, discrete-time environment. Since solving MAOS becomes computationally prohibitive as…
▽ More
Optimal stopping is a fundamental problem in optimization with applications in risk management, finance, robotics, and machine learning. We extend the standard framework to a multi-agent setting, named multi-agent optimal stopping (MAOS), where agents cooperate to make optimal stopping decisions in a finite-space, discrete-time environment. Since solving MAOS becomes computationally prohibitive as the number of agents is very large, we study the mean-field optimal stopping (MFOS) problem, obtained as the number of agents tends to infinity. We establish that MFOS provides a good approximation to MAOS and prove a dynamic programming principle (DPP) based on mean-field control theory. We then propose two deep learning approaches: one that learns optimal stopping decisions by simulating full trajectories and another that leverages the DPP to compute the value function and to learn the optimal stopping rule using backward induction. Both methods train neural networks to approximate optimal stopping policies. We demonstrate the effectiveness and the scalability of our work through numerical experiments on 6 different problems in spatial dimension up to 300. To the best of our knowledge, this is the first work to formalize and computationally solve MFOS in discrete time and finite space, opening new directions for scalable MAOS methods.
△ Less
Submitted 9 June, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control
Authors:
Mengjian Hua,
Mathieu Laurière,
Eric Vanden-Eijnden
Abstract:
We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies wh…
▽ More
We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies while scaling efficiently to high-dimensional problems and long time horizons. We evaluate our method on classical SOC benchmarks as well as applications to sampling from unnormalized distributions via Schrödinger-Föllmer processes and fine-tuning pre-trained diffusion models. Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.
△ Less
Submitted 12 May, 2025; v1 submitted 7 October, 2024;
originally announced October 2024.
-
Reinforcement Learning for Finite Space Mean-Field Type Games
Authors:
Kai Shao,
Jiacheng Shen,
Chijie An,
Mathieu Laurière
Abstract:
Mean field type games (MFTGs) describe Nash equilibria between large coalitions: each coalition consists of a continuum of cooperative agents who maximize the average reward of their coalition while interacting non-cooperatively with a finite number of other coalitions. Although the theory has been extensively developed, we are still lacking efficient and scalable computational methods. Here, we d…
▽ More
Mean field type games (MFTGs) describe Nash equilibria between large coalitions: each coalition consists of a continuum of cooperative agents who maximize the average reward of their coalition while interacting non-cooperatively with a finite number of other coalitions. Although the theory has been extensively developed, we are still lacking efficient and scalable computational methods. Here, we develop reinforcement learning methods for such games in a finite space setting with general dynamics and reward functions. We start by proving that MFTG solution yields approximate Nash equilibria in finite-size coalition games. We then propose two algorithms. The first is based on quantization of mean-field spaces and Nash Q-learning. We provide convergence and stability analysis. We then propose a deep reinforcement learning algorithm, which can scale to larger spaces. Numerical experiments in 5 environments with mean-field distributions of dimension up to $200$ show the scalability and efficiency of the proposed method.
△ Less
Submitted 4 December, 2024; v1 submitted 25 September, 2024;
originally announced September 2024.
-
Robust Cooperative Multi-Agent Reinforcement Learning:A Mean-Field Type Game Perspective
Authors:
Muhammad Aneeq uz Zaman,
Mathieu Laurière,
Alec Koppel,
Tamer Başar
Abstract:
In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainti…
▽ More
In this paper, we study the problem of robust cooperative multi-agent reinforcement learning (RL) where a large number of cooperative agents with distributed information aim to learn policies in the presence of \emph{stochastic} and \emph{non-stochastic} uncertainties whose distributions are respectively known and unknown. Focusing on policy optimization that accounts for both types of uncertainties, we formulate the problem in a worst-case (minimax) framework, which is is intractable in general. Thus, we focus on the Linear Quadratic setting to derive benchmark solutions. First, since no standard theory exists for this problem due to the distributed information structure, we utilize the Mean-Field Type Game (MFTG) paradigm to establish guarantees on the solution quality in the sense of achieved Nash equilibrium of the MFTG. This in turn allows us to compare the performance against the corresponding original robust multi-agent control problem. Then, we propose a Receding-horizon Gradient Descent Ascent RL algorithm to find the MFTG Nash equilibrium and we prove a non-asymptotic rate of convergence. Finally, we provide numerical experiments to demonstrate the efficacy of our approach relative to a baseline algorithm.
△ Less
Submitted 12 June, 2025; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Global Solutions to Master Equations for Continuous Time Heterogeneous Agent Macroeconomic Models
Authors:
Zhouzhou Gu,
Mathieu Laurière,
Sebastian Merkel,
Jonathan Payne
Abstract:
We propose and compare new global solution algorithms for continuous time heterogeneous agent economies with aggregate shocks. First, we approximate the agent distribution so that equilibrium in the economy can be characterized by a high, but finite, dimensional non-linear partial differential equation. We consider different approximations: discretizing the number of agents, discretizing the agent…
▽ More
We propose and compare new global solution algorithms for continuous time heterogeneous agent economies with aggregate shocks. First, we approximate the agent distribution so that equilibrium in the economy can be characterized by a high, but finite, dimensional non-linear partial differential equation. We consider different approximations: discretizing the number of agents, discretizing the agent state variables, and projecting the distribution onto a finite set of basis functions. Second, we represent the value function using a neural network and train it to solve the differential equation using deep learning tools. We refer to the solution as an Economic Model Informed Neural Network (EMINN). The main advantage of this technique is that it allows us to find global solutions to high dimensional, non-linear problems. We demonstrate our algorithm by solving important models in the macroeconomics and spatial literatures (e.g. Krusell and Smith (1998), Khan and Thomas (2007), Bilal (2023)).
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games
Authors:
Andrea Angiuli,
Jean-Pierre Fouque,
Mathieu Laurière,
Mengrui Zhang
Abstract:
Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agen…
▽ More
Mean Field Control Games (MFCG), introduced in [Angiuli et al., 2022a], represent competitive games between a large number of large collaborative groups of agents in the infinite limit of number and size of groups. In this paper, we prove the convergence of a three-timescale Reinforcement Q-Learning (RL) algorithm to solve MFCG in a model-free approach from the point of view of representative agents. Our analysis uses a Q-table for finite state and action spaces updated at each discrete time-step over an infinite horizon. In [Angiuli et al., 2023], we proved convergence of two-timescale algorithms for MFG and MFC separately highlighting the need to follow multiple population distributions in the MFC case. Here, we integrate this feature for MFCG as well as three rates of update decreasing to zero in the proper ratios. Our technique of proof uses a generalization to three timescales of the two-timescale analysis in [Borkar, 1997]. We give a simple example satisfying the various hypothesis made in the proof of convergence and illustrating the performance of the algorithm.
△ Less
Submitted 3 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Independent RL for Cooperative-Competitive Agents: A Mean-Field Perspective
Authors:
Muhammad Aneeq uz Zaman,
Alec Koppel,
Mathieu Laurière,
Tamer Başar
Abstract:
We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in th…
▽ More
We address in this paper Reinforcement Learning (RL) among agents that are grouped into teams such that there is cooperation within each team but general-sum (non-zero sum) competition across different teams. To develop an RL method that provably achieves a Nash equilibrium, we focus on a linear-quadratic structure. Moreover, to tackle the non-stationarity induced by multi-agent interactions in the finite population setting, we consider the case where the number of agents within each team is infinite, i.e., the mean-field setting. This results in a General-Sum LQ Mean-Field Type Game (GS-MFTG). We characterize the Nash equilibrium (NE) of the GS-MFTG, under a standard invertibility condition. This MFTG NE is then shown to be $O(1/M)$-NE for the finite population game where $M$ is a lower bound on the number of agents in each team. These structural results motivate an algorithm called Multi-player Receding-horizon Natural Policy Gradient (MRNPG), where each team minimizes its cumulative cost \emph{independently} in a receding-horizon manner. Despite the non-convexity of the problem, we establish that the resulting algorithm converges to a global NE through a novel problem decomposition into sub-problems using backward recursive discrete-time Hamilton-Jacobi-Isaacs (HJI) equations, in which \emph{independent natural policy gradient} is shown to exhibit linear convergence under time-independent diagonal dominance. Numerical studies included corroborate the theoretical results.
△ Less
Submitted 8 February, 2025; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning
Authors:
Zida Wu,
Mathieu Lauriere,
Samuel Jia Cong Chua,
Matthieu Geist,
Olivier Pietquin,
Ankur Mehta
Abstract:
Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the desig…
▽ More
Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-dependent Nash equilibrium without the need for averaging or sampling from history, inspired by Munchausen RL and Online Mirror Descent. Through the design of an additional inner-loop replay buffer, the agents can effectively learn to achieve Nash equilibrium from any distribution, mitigating catastrophic forgetting. The resulting policy can be applied to various initial distributions. Numerical experiments on four canonical examples demonstrate our algorithm has better convergence properties than SOTA algorithms, in particular a DRL version of Fictitious Play for population-dependent policies.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
A Deep Learning Method for Optimal Investment Under Relative Performance Criteria Among Heterogeneous Agents
Authors:
Mathieu Laurière,
Ludovic Tangpi,
Xuchen Zhou
Abstract:
Graphon games have been introduced to study games with many players who interact through a weighted graph of interaction. By passing to the limit, a game with a continuum of players is obtained, in which the interactions are through a graphon. In this paper, we focus on a graphon game for optimal investment under relative performance criteria, and we propose a deep learning method. The method buil…
▽ More
Graphon games have been introduced to study games with many players who interact through a weighted graph of interaction. By passing to the limit, a game with a continuum of players is obtained, in which the interactions are through a graphon. In this paper, we focus on a graphon game for optimal investment under relative performance criteria, and we propose a deep learning method. The method builds upon two key ingredients: first, a characterization of Nash equilibria by forward-backward stochastic differential equations and, second, recent advances of machine learning algorithms for stochastic differential games. We provide numerical experiments on two different financial models. In each model, we compare the effect of several graphons, which correspond to different structures of interactions.
△ Less
Submitted 30 March, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Learning Discrete-Time Major-Minor Mean Field Games
Authors:
Kai Cui,
Gökçe Dayanıklı,
Mathieu Laurière,
Matthieu Geist,
Olivier Pietquin,
Heinz Koeppl
Abstract:
Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and cannot model major players that strongly influence other players, severely limiting the class of problems that can be handled. We propose a novel discrete time vers…
▽ More
Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and cannot model major players that strongly influence other players, severely limiting the class of problems that can be handled. We propose a novel discrete time version of major-minor MFGs (M3FGs), along with a learning algorithm based on fictitious play and partitioning the probability simplex. Importantly, M3FGs generalize MFGs with common noise and can handle not only random exogeneous environment states but also major players. A key challenge is that the mean field is stochastic and not deterministic as in standard MFGs. Our theoretical investigation verifies both the M3FG model and its algorithmic solution, showing firstly the well-posedness of the M3FG model starting from a finite game of interest, and secondly convergence and approximation guarantees of the fictitious play algorithm. Then, we empirically verify the obtained theoretical results, ablating some of the theoretical assumptions made, and show successful equilibrium learning in three example problems. Overall, we establish a learning framework for a novel and broad class of tractable games.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
From Nash Equilibrium to Social Optimum and vice versa: a Mean Field Perspective
Authors:
Rene Carmona,
Gokce Dayanikli,
Francois Delarue,
Mathieu Lauriere
Abstract:
Mean field games (MFG) and mean field control (MFC) problems have been introduced to study large populations of strategic players. They correspond respectively to non-cooperative or cooperative scenarios, where the aim is to find the Nash equilibrium and social optimum. These frameworks provide approximate solutions to situations with a finite number of players and have found a wide range of appli…
▽ More
Mean field games (MFG) and mean field control (MFC) problems have been introduced to study large populations of strategic players. They correspond respectively to non-cooperative or cooperative scenarios, where the aim is to find the Nash equilibrium and social optimum. These frameworks provide approximate solutions to situations with a finite number of players and have found a wide range of applications, from economics to biology and machine learning. In this paper, we study how the players can pass from a non-cooperative to a cooperative regime, and vice versa. The first direction is reminiscent of mechanism design, in which the game's definition is modified so that non-cooperative players reach an outcome similar to a cooperative scenario. The second direction studies how players that are initially cooperative gradually deviate from a social optimum to reach a Nash equilibrium when they decide to optimize their individual cost similar to the free rider phenomenon. To formalize these connections, we introduce two new classes of games which lie between MFG and MFC: $λ$-interpolated mean field games, in which the cost of an individual player is a $λ$-interpolation of the MFG and the MFC costs, and $p$-partial mean field games, in which a proportion $p$ of the population deviates from the social optimum by playing the game non-cooperatively. We conclude the paper by providing an algorithm for myopic players to learn a $p$-partial mean field equilibrium, and we illustrate it on a stylized model.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
On Imitation in Mean-field Games
Authors:
Giorgia Ramponi,
Pavel Kolev,
Olivier Pietquin,
Niao He,
Mathieu Laurière,
Matthieu Geist
Abstract:
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs presents new challenges compared to single-agent IL, particularly when both the reward function and the transition kernel depend on the population di…
▽ More
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs presents new challenges compared to single-agent IL, particularly when both the reward function and the transition kernel depend on the population distribution. In this paper, departing from the existing literature on IL for MFGs, we introduce a new solution concept called the Nash imitation gap. Then we show that when only the reward depends on the population distribution, IL in MFGs can be reduced to single-agent IL with similar guarantees. However, when the dynamics is population-dependent, we provide a novel upper-bound that suggests IL is harder in this setting. To address this issue, we propose a new adversarial formulation where the reinforcement learning problem is replaced by a mean-field control (MFC) problem, suggesting progress in IL within MFGs may have to build upon MFC.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
The communication complexity of functions with large outputs
Authors:
Lila Fontes,
Sophie Laplante,
Mathieu Lauriere,
Alexandre Nolin
Abstract:
We study the two-party communication complexity of functions with large outputs, and show that the communication complexity can greatly vary depending on what output model is considered. We study a variety of output models, ranging from the open model, in which an external observer can compute the outcome, to the XOR model, in which the outcome of the protocol should be the bitwise XOR of the play…
▽ More
We study the two-party communication complexity of functions with large outputs, and show that the communication complexity can greatly vary depending on what output model is considered. We study a variety of output models, ranging from the open model, in which an external observer can compute the outcome, to the XOR model, in which the outcome of the protocol should be the bitwise XOR of the players' local outputs. This model is inspired by XOR games, which are widely studied two-player quantum games. We focus on the question of error-reduction in these new output models. For functions of output size k, applying standard error reduction techniques in the XOR model would introduce an additional cost linear in k. We show that no dependency on k is necessary. Similarly, standard randomness removal techniques, incur a multiplicative cost of $2^k$ in the XOR model. We show how to reduce this factor to O(k). In addition, we prove analogous error reduction and randomness removal results in the other models, separate all models from each other, and show that some natural problems, including Set Intersection and Find the First Difference, separate the models when the Hamming weights of their inputs is bounded. Finally, we show how to use the rank lower bound technique for our weak output models.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Recent Developments in Machine Learning Methods for Stochastic Control and Games
Authors:
Ruimeng Hu,
Mathieu Laurière
Abstract:
Stochastic optimal control and games have a wide range of applications, from finance and economics to social sciences, robotics, and energy management. Many real-world applications involve complex models that have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for solving stochastic control problems and games…
▽ More
Stochastic optimal control and games have a wide range of applications, from finance and economics to social sciences, robotics, and energy management. Many real-world applications involve complex models that have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for solving stochastic control problems and games. In this review, we focus on deep learning methods that have unlocked the possibility of solving such problems, even in high dimensions or when the structure is very complex, beyond what traditional numerical methods can achieve. We consider mostly the continuous time and continuous space setting. Many of the new approaches build on recent neural-network-based methods for solving high-dimensional partial differential equations or backward stochastic differential equations, or on model-free reinforcement learning for Markov decision processes that have led to breakthrough results. This paper provides an introduction to these methods and summarizes the state-of-the-art works at the crossroad of machine learning and stochastic control and games.
△ Less
Submitted 11 March, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Deep Learning for Mean Field Optimal Transport
Authors:
Sebastian Baudelet,
Brieuc Frénais,
Mathieu Laurière,
Amal Machtalay,
Yuchen Zhu
Abstract:
Mean field control (MFC) problems have been introduced to study social optima in very large populations of strategic agents. The main idea is to consider an infinite population and to simplify the analysis by using a mean field approximation. These problems can also be viewed as optimal control problems for McKean-Vlasov dynamics. They have found applications in a wide range of fields, from econom…
▽ More
Mean field control (MFC) problems have been introduced to study social optima in very large populations of strategic agents. The main idea is to consider an infinite population and to simplify the analysis by using a mean field approximation. These problems can also be viewed as optimal control problems for McKean-Vlasov dynamics. They have found applications in a wide range of fields, from economics and finance to social sciences and engineering. Usually, the goal for the agents is to minimize a total cost which consists in the integral of a running cost plus a terminal cost. In this work, we consider MFC problems in which there is no terminal cost but, instead, the terminal distribution is prescribed. We call such problems mean field optimal transport problems since they can be viewed as a generalization of classical optimal transport problems when mean field interactions occur in the dynamics or the running cost function. We propose three numerical methods based on neural networks. The first one is based on directly learning an optimal control. The second one amounts to solve a forward-backward PDE system characterizing the solution. The third one relies on a primal-dual approach. We illustrate these methods with numerical experiments conducted on two families of examples.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Learning Correlated Equilibria in Mean-Field Games
Authors:
Paul Muller,
Romuald Elie,
Mark Rowland,
Mathieu Lauriere,
Julien Perolat,
Sarah Perrin,
Matthieu Geist,
Georgios Piliouras,
Olivier Pietquin,
Karl Tuyls
Abstract:
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-…
▽ More
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts. However, as the size of an $N$-player game typically grows exponentially with $N$, standard game theoretic analysis becomes effectively infeasible beyond a low number of players. Recent approaches have gone around this limitation by instead considering Mean-Field games, an approximation of anonymous $N$-player games, where the number of players is infinite and the population's state distribution, instead of every individual player's state, is the object of interest. The practical computability of Mean-Field Nash equilibria, the most studied Mean-Field equilibrium to date, however, typically depends on beneficial non-generic structural properties such as monotonicity or contraction properties, which are required for known algorithms to converge. In this work, we provide an alternative route for studying Mean-Field games, by developing the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in \emph{all games}, without requiring any additional assumption on the structure of the game, using three classical algorithms. Furthermore, we establish correspondences between our notions and those already present in the literature, derive optimality bounds for the Mean-Field - $N$-player transition, and empirically demonstrate the convergence of these algorithms on simple games.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Learning in Mean Field Games: A Survey
Authors:
Mathieu Laurière,
Sarah Perrin,
Julien Pérolat,
Sertan Girgin,
Paul Muller,
Romuald Élie,
Matthieu Geist,
Olivier Pietquin
Abstract:
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely…
▽ More
Non-cooperative and cooperative games with a very large number of players have many applications but remain generally intractable when the number of players increases. Introduced by Lasry and Lions, and Huang, Caines and Malhamé, Mean Field Games (MFGs) rely on a mean-field approximation to allow the number of players to grow to infinity. Traditional methods for solving these games generally rely on solving partial or stochastic differential equations with a full knowledge of the model. Recently, Reinforcement Learning (RL) has appeared promising to solve complex problems at scale. The combination of RL and MFGs is promising to solve games at a very large scale both in terms of population size and environment complexity. In this survey, we review the quickly growing recent literature on RL methods to learn equilibria and social optima in MFGs. We first identify the most common settings (static, stationary, and evolutive) of MFGs. We then present a general framework for classical iterative methods (based on best-response computation or policy evaluation) to solve MFGs in an exact way. Building on these algorithms and the connection with Markov Decision Processes, we explain how RL can be used to learn MFG solutions in a model-free way. Last, we present numerical illustrations on a benchmark problem, and conclude with some perspectives.
△ Less
Submitted 26 July, 2024; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
Authors:
Mathieu Laurière,
Sarah Perrin,
Sertan Girgin,
Paul Muller,
Ayush Jain,
Theophile Cabannes,
Georgios Piliouras,
Julien Pérolat,
Romuald Élie,
Olivier Pietquin,
Matthieu Geist
Abstract:
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quant…
▽ More
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. One limiting factor to further scale up using RL is that existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. This is far from being trivial in the case of non-linear function approximation that enjoy good generalization properties, e.g. neural networks. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on regularization that does not require memorizing historical data or previous estimates. It is used to extend Online Mirror Descent. We demonstrate numerically that these methods efficiently enable the use of Deep RL algorithms to solve various MFGs. In addition, we show that these methods outperform SotA baselines from the literature.
△ Less
Submitted 17 June, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO
Authors:
Paul Muller,
Mark Rowland,
Romuald Elie,
Georgios Piliouras,
Julien Perolat,
Mathieu Lauriere,
Raphael Marinier,
Olivier Pietquin,
Karl Tuyls
Abstract:
Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-…
▽ More
Recent advances in multiagent learning have seen the introduction ofa family of algorithms that revolve around the population-based trainingmethod PSRO, showing convergence to Nash, correlated and coarse corre-lated equilibria. Notably, when the number of agents increases, learningbest-responses becomes exponentially more difficult, and as such ham-pers PSRO training methods. The paradigm of mean-field games pro-vides an asymptotic solution to this problem when the considered gamesare anonymous-symmetric. Unfortunately, the mean-field approximationintroduces non-linearities which prevent a straightforward adaptation ofPSRO. Building upon optimization and adversarial regret minimization,this paper sidesteps this issue and introduces mean-field PSRO, an adap-tation of PSRO which learns Nash, coarse correlated and correlated equi-libria in mean-field games. The key is to replace the exact distributioncomputation step by newly-defined mean-field no-adversarial-regret learn-ers, or by black-box optimization. We compare the asymptotic complexityof the approach to standard PSRO, greatly improve empirical bandit con-vergence speed by compressing temporal mixture weights, and ensure itis theoretically robust to payoff noise. Finally, we illustrate the speed andaccuracy of mean-field PSRO on several mean-field games, demonstratingconvergence to strong and weak equilibria.
△ Less
Submitted 29 August, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Solving N-player dynamic routing games with congestion: a mean field approach
Authors:
Theophile Cabannes,
Mathieu Lauriere,
Julien Perolat,
Raphael Marinier,
Sertan Girgin,
Sarah Perrin,
Olivier Pietquin,
Alexandre M. Bayen,
Eric Goubault,
Romuald Elie
Abstract:
The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can rep…
▽ More
The recent emergence of navigational tools has changed traffic patterns and has now enabled new types of congestion-aware routing control like dynamic road pricing. Using the fundamental diagram of traffic flows - applied in macroscopic and mesoscopic traffic modeling - the article introduces a new N-player dynamic routing game with explicit congestion dynamics. The model is well-posed and can reproduce heterogeneous departure times and congestion spill back phenomena. However, as Nash equilibrium computations are PPAD-complete, solving the game becomes intractable for large but realistic numbers of vehicles N. Therefore, the corresponding mean field game is also introduced. Experiments were performed on several classical benchmark networks of the traffic community: the Pigou, Braess, and Sioux Falls networks with heterogeneous origin, destination and departure time tuples. The Pigou and the Braess examples reveal that the mean field approximation is generally very accurate and computationally efficient as soon as the number of vehicles exceeds a few dozen. On the Sioux Falls network (76 links, 100 time steps), this approach enables learning traffic dynamics with more than 14,000 vehicles.
△ Less
Submitted 27 October, 2021; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Generalization in Mean Field Games by Learning Master Policies
Authors:
Sarah Perrin,
Mathieu Laurière,
Julien Pérolat,
Romuald Élie,
Matthieu Geist,
Olivier Pietquin
Abstract:
Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization…
▽ More
Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization properties to learn policies enabling a typical agent to behave optimally against any population distribution. In reference to the Master equation in MFGs, we coin the term ``Master policies'' to describe them and we prove that a single Master policy provides a Nash equilibrium, whatever the initial distribution. We propose a method to learn such Master policies. Our approach relies on three ingredients: adding the current population distribution as part of the observation, approximating Master policies with neural networks, and training via Reinforcement Learning and Fictitious Play. We illustrate on numerical examples not only the efficiency of the learned Master policy but also its generalization capabilities beyond the distributions used for training.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Performance of a Markovian neural network versus dynamic programming on a fishing control problem
Authors:
Mathieu Laurière,
Gilles Pagès,
Olivier Pironneau
Abstract:
Fishing quotas are unpleasant but efficient to control the productivity of a fishing site. A popular model has a stochastic differential equation for the biomass on which a stochastic dynamic programming or a Hamilton-Jacobi-Bellman algorithm can be used to find the stochastic control -- the fishing quota. We compare the solutions obtained by dynamic programming against those obtained with a neura…
▽ More
Fishing quotas are unpleasant but efficient to control the productivity of a fishing site. A popular model has a stochastic differential equation for the biomass on which a stochastic dynamic programming or a Hamilton-Jacobi-Bellman algorithm can be used to find the stochastic control -- the fishing quota. We compare the solutions obtained by dynamic programming against those obtained with a neural network which preserves the Markov property of the solution. The method is extended to a similar multi species model to check its robustness in high dimension.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Deep Learning for Mean Field Games and Mean Field Control with Applications to Finance
Authors:
René Carmona,
Mathieu Laurière
Abstract:
Financial markets and more generally macro-economic models involve a large number of individuals interacting through variables such as prices resulting from the aggregate behavior of all the agents. Mean field games have been introduced to study Nash equilibria for such problems in the limit when the number of players is infinite. The theory has been extensively developed in the past decade, using…
▽ More
Financial markets and more generally macro-economic models involve a large number of individuals interacting through variables such as prices resulting from the aggregate behavior of all the agents. Mean field games have been introduced to study Nash equilibria for such problems in the limit when the number of players is infinite. The theory has been extensively developed in the past decade, using both analytical and probabilistic tools, and a wide range of applications have been discovered, from economics to crowd motion. More recently the interaction with machine learning has attracted a growing interest. This aspect is particularly relevant to solve very large games with complex structures, in high dimension or with common sources of randomness. In this chapter, we review the literature on the interplay between mean field games and deep learning, with a focus on three families of methods. A special emphasis is given to financial applications.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Reinforcement Learning for Mean Field Games, with Applications to Economics
Authors:
Andrea Angiuli,
Jean-Pierre Fouque,
Mathieu Lauriere
Abstract:
Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC ha…
▽ More
Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. These problems can be used to approximate competitive or cooperative games with a large finite number of agents and have found a broad range of applications, in particular in economics. In recent years, the question of learning in MFG and MFC has garnered interest, both as a way to compute solutions and as a way to model how large populations of learners converge to an equilibrium. Of particular interest is the setting where the agents do not know the model, which leads to the development of reinforcement learning (RL) methods. After reviewing the literature on this topic, we present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm. The main novelty of this method is to simultaneously update an action-value function and a distribution but with different rates, in a model-free fashion. Depending on the ratio of the two learning rates, the algorithm learns either the MFG or the MFC solution. To illustrate this method, we apply it to a mean field problem of accumulated consumption in finite horizon with HARA utility function, and to a trader's optimal liquidation problem.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Concave Utility Reinforcement Learning: the Mean-Field Game Viewpoint
Authors:
Matthieu Geist,
Julien Pérolat,
Mathieu Laurière,
Romuald Elie,
Sarah Perrin,
Olivier Bachem,
Rémi Munos,
Olivier Pietquin
Abstract:
Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of m…
▽ More
Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently.
△ Less
Submitted 16 February, 2022; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Mean Field Games Flock! The Reinforcement Learning Way
Authors:
Sarah Perrin,
Mathieu Laurière,
Julien Pérolat,
Matthieu Geist,
Romuald Élie,
Olivier Pietquin
Abstract:
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population be…
▽ More
We present a method enabling a large number of agents to learn how to flock, which is a natural behavior observed in large populations of animals. This problem has drawn a lot of interest but requires many structural assumptions and is tractable only in small dimensions. We phrase this problem as a Mean Field Game (MFG), where each individual chooses its acceleration depending on the population behavior. Combining Deep Reinforcement Learning (RL) and Normalizing Flows (NF), we obtain a tractable solution requiring only very weak assumptions. Our algorithm finds a Nash Equilibrium and the agents adapt their velocity to match the neighboring flock's average one. We use Fictitious Play and alternate: (1) computing an approximate best response with Deep RL, and (2) estimating the next population distribution with NF. We show numerically that our algorithm learn multi-group or high-dimensional flocking with obstacles.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Scaling up Mean Field Games with Online Mirror Descent
Authors:
Julien Perolat,
Sarah Perrin,
Romuald Elie,
Mathieu Laurière,
Georgios Piliouras,
Matthieu Geist,
Karl Tuyls,
Olivier Pietquin
Abstract:
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on vari…
▽ More
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD). We show that continuous-time OMD provably converges to a Nash equilibrium under a natural and well-motivated set of monotonicity assumptions. This theoretical result nicely extends to multi-population games and to settings involving common noise. A thorough experimental investigation on various single and multi-population MFGs shows that OMD outperforms traditional algorithms such as Fictitious Play (FP). We empirically show that OMD scales up and converges significantly faster than FP by solving, for the first time to our knowledge, examples of MFGs with hundreds of billions states. This study establishes the state-of-the-art for learning in large-scale multi-agent and multi-population games.
△ Less
Submitted 28 February, 2021;
originally announced March 2021.
-
Policy Optimization for Linear-Quadratic Zero-Sum Mean-Field Type Games
Authors:
René Carmona,
Kenza Hamidouche,
Mathieu Laurière,
Zongjun Tan
Abstract:
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of agents. In particular, the case in which the transition and utility functions depend on the state, the action…
▽ More
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic utility are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The game is analyzed and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the first case, the gradients are computed exactly using the model whereas they are estimated using Monte-Carlo simulations in the second case. Numerical experiments show the convergence of the two players' controls as well as the utility function when the two algorithms are used in different scenarios.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
Linear-Quadratic Zero-Sum Mean-Field Type Games: Optimality Conditions and Policy Optimization
Authors:
René Carmona,
Kenza Hamidouche,
Mathieu Laurière,
Zongjun Tan
Abstract:
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the st…
▽ More
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both open-loop and closed-loop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the model-based case, the gradients are computed exactly using the model, whereas they are estimated using Monte-Carlo simulations in the sample-based case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.
△ Less
Submitted 1 September, 2020;
originally announced September 2020.
-
Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
Authors:
Sarah Perrin,
Julien Perolat,
Mathieu Laurière,
Matthieu Geist,
Romuald Elie,
Olivier Pietquin
Abstract:
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise.
We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced e…
▽ More
In this paper, we deepen the analysis of continuous time Fictitious Play learning algorithm to the consideration of various finite state Mean Field Game settings (finite horizon, $γ$-discounted), allowing in particular for the introduction of an additional common noise.
We first present a theoretical convergence analysis of the continuous time Fictitious Play process and prove that the induced exploitability decreases at a rate $O(\frac{1}{t})$. Such analysis emphasizes the use of exploitability as a relevant metric for evaluating the convergence towards a Nash equilibrium in the context of Mean Field Games. These theoretical contributions are supported by numerical experiments provided in either model-based or model-free settings. We provide hereby for the first time converging learning dynamics for Mean Field Games in the presence of common noise.
△ Less
Submitted 26 October, 2020; v1 submitted 5 July, 2020;
originally announced July 2020.
-
Unified Reinforcement Q-Learning for Mean Field Game and Control Problems
Authors:
Andrea Angiuli,
Jean-Pierre Fouque,
Mathieu Laurière
Abstract:
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The \emph{same} algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space w…
▽ More
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The \emph{same} algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent can not observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.
△ Less
Submitted 31 May, 2021; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Learning a functional control for high-frequency finance
Authors:
Laura Leal,
Mathieu Laurière,
Charles-Albert Lehalle
Abstract:
We use a deep neural network to generate controllers for optimal trading on high frequency data. For the first time, a neural network learns the mapping between the preferences of the trader, i.e. risk aversion parameters, and the optimal controls. An important challenge in learning this mapping is that in intraday trading, trader's actions influence price dynamics in closed loop via the market im…
▽ More
We use a deep neural network to generate controllers for optimal trading on high frequency data. For the first time, a neural network learns the mapping between the preferences of the trader, i.e. risk aversion parameters, and the optimal controls. An important challenge in learning this mapping is that in intraday trading, trader's actions influence price dynamics in closed loop via the market impact. The exploration--exploitation tradeoff generated by the efficient execution is addressed by tuning the trader's preferences to ensure long enough trajectories are produced during the learning phase. The issue of scarcity of financial data is solved by transfer learning: the neural network is first trained on trajectories generated thanks to a Monte-Carlo scheme, leading to a good initialization before training on historical trajectories. Moreover, to answer to genuine requests of financial regulators on the explainability of machine learning generated controls, we project the obtained "blackbox controls" on the space usually spanned by the closed-form solution of the stylized optimal trading problem, leading to a transparent structure. For more realistic loss functions that have no closed-form solution, we show that the average distance between the generated controls and their explainable version remains small. This opens the door to the acceptance of ML-generated controls by financial regulators.
△ Less
Submitted 11 February, 2021; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Connecting GANs, MFGs, and OT
Authors:
Haoyang Cao,
Xin Guo,
Mathieu Laurière
Abstract:
Generative adversarial networks (GANs) have enjoyed tremendous success in image generation and processing, and have recently attracted growing interests in financial modelings. This paper analyzes GANs from the perspectives of mean-field games (MFGs) and optimal transport. More specifically, from the game theoretical perspective, GANs are interpreted as MFGs under Pareto Optimality criterion or me…
▽ More
Generative adversarial networks (GANs) have enjoyed tremendous success in image generation and processing, and have recently attracted growing interests in financial modelings. This paper analyzes GANs from the perspectives of mean-field games (MFGs) and optimal transport. More specifically, from the game theoretical perspective, GANs are interpreted as MFGs under Pareto Optimality criterion or mean-field controls; from the optimal transport perspective, GANs are to minimize the optimal transport cost indexed by the generator from the known latent distribution to the unknown true distribution of data. The MFGs perspective of GANs leads to a GAN-based computational method (MFGANs) to solve MFGs: one neural network for the backward Hamilton-Jacobi-Bellman equation and one neural network for the forward Fokker-Planck equation, with the two neural networks trained in an adversarial way. Numerical experiments demonstrate superior performance of this proposed algorithm, especially in the higher dimensional case, when compared with existing neural network approaches.
△ Less
Submitted 4 September, 2021; v1 submitted 10 February, 2020;
originally announced February 2020.
-
Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning
Authors:
René Carmona,
Mathieu Laurière,
Zongjun Tan
Abstract:
We study infinite horizon discounted Mean Field Control (MFC) problems with common noise through the lens of Mean Field Markov Decision Processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization allows us to establish connections between both closed-loop and open-loop policies for MFC…
▽ More
We study infinite horizon discounted Mean Field Control (MFC) problems with common noise through the lens of Mean Field Markov Decision Processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization allows us to establish connections between both closed-loop and open-loop policies for MFC and Markov policies for the MFMDP. In particular, we show that there exists an optimal closed-loop policy for the original MFC. Building on this framework and the notion of state-action value function, we then propose reinforcement learning (RL) methods for such problems, by adapting existing tabular and deep RL methods to the mean-field setting. The main difficulty is the treatment of the population state, which is an input of the policy and the value function. We provide convergence guarantees for tabular algorithms based on discretizations of the simplex. Neural network based algorithms are more suitable for continuous spaces and allow us to avoid discretizing the mean field state space. Numerical examples are provided.
△ Less
Submitted 13 October, 2021; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods
Authors:
René Carmona,
Mathieu Laurière,
Zongjun Tan
Abstract:
We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots communicating through a central unit dispatching the optimal policy computed by maximizing an aggregate reward. An approximate solution is obtained by learning the o…
▽ More
We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots communicating through a central unit dispatching the optimal policy computed by maximizing an aggregate reward. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states and actions of the other agents. We first provide a full analysis this discrete-time mean field control problem. We then rigorously prove the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting and establish bounds on the rates of convergence. We also provide graphical evidence of the convergence based on implementations of our algorithms.
△ Less
Submitted 28 April, 2025; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: II -- The Finite Horizon Case
Authors:
René Carmona,
Mathieu Laurière
Abstract:
We propose two numerical methods for the optimal control of McKean-Vlasov dynamics in finite time horizon. Both methods are based on the introduction of a suitable loss function defined over the parameters of a neural network. This allows the use of machine learning tools, and efficient implementations of stochastic gradient descent in order to perform the optimization. In the first method, the lo…
▽ More
We propose two numerical methods for the optimal control of McKean-Vlasov dynamics in finite time horizon. Both methods are based on the introduction of a suitable loss function defined over the parameters of a neural network. This allows the use of machine learning tools, and efficient implementations of stochastic gradient descent in order to perform the optimization. In the first method, the loss function stems directly from the optimal control problem. The second method tackles a generic forward-backward stochastic differential equation system (FBSDE) of McKean-Vlasov type, and relies on suitable reformulation as a mean field control problem. To provide a guarantee on how our numerical schemes approximate the solution of the original mean field control problem, we introduce a new optimization problem, directly amenable to numerical computation, and for which we rigorously provide an error rate. Several numerical examples are provided. Both methods can easily be applied to certain problems with common noise, which is not the case with the existing technology. Furthermore, although the first approach is designed for mean field control problems, the second is more general and can also be applied to the FBSDE arising in the theory of mean field games.
△ Less
Submitted 29 March, 2021; v1 submitted 5 August, 2019;
originally announced August 2019.
-
Convergence Analysis of Machine Learning Algorithms for the Numerical Solution of Mean Field Control and Games: I -- The Ergodic Case
Authors:
René Carmona,
Mathieu Laurière
Abstract:
We propose two algorithms for the solution of the optimal control of ergodic McKean-Vlasov dynamics. Both algorithms are based on approximations of the theoretical solutions by neural networks, the latter being characterized by their architecture and a set of parameters. This allows the use of modern machine learning tools, and efficient implementations of stochastic gradient descent.The first alg…
▽ More
We propose two algorithms for the solution of the optimal control of ergodic McKean-Vlasov dynamics. Both algorithms are based on approximations of the theoretical solutions by neural networks, the latter being characterized by their architecture and a set of parameters. This allows the use of modern machine learning tools, and efficient implementations of stochastic gradient descent.The first algorithm is based on the idiosyncrasies of the ergodic optimal control problem. We provide a mathematical proof of the convergence of the approximation scheme, and we analyze rigorously the approximation by controlling the different sources of error. The second method is an adaptation of the deep Galerkin method to the system of partial differential equations issued from the optimality condition. We demonstrate the efficiency of these algorithms on several numerical examples, some of them being chosen to show that our algorithms succeed where existing ones failed. We also argue that both methods can easily be applied to problems in dimensions larger than what can be found in the existing literature. Finally, we illustrate the fact that, although the first algorithm is specifically designed for mean field control problems, the second one is more general and can also be applied to the partial differential equation systems arising in the theory of mean field games.
△ Less
Submitted 29 March, 2021; v1 submitted 12 July, 2019;
originally announced July 2019.
-
On the Convergence of Model Free Learning in Mean Field Games
Authors:
Romuald Elie,
Julien Pérolat,
Mathieu Laurière,
Matthieu Geist,
Olivier Pietquin
Abstract:
Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently…
▽ More
Learning by experience in Multi-Agent Systems (MAS) is a difficult and exciting task, due to the lack of stationarity of the environment, whose dynamics evolves as the population learns. In order to design scalable algorithms for systems with a large population of interacting agents (e.g. swarms), this paper focuses on Mean Field MAS, where the number of agents is asymptotically infinite. Recently, a very active burgeoning field studies the effects of diverse reinforcement learning algorithms for agents with no prior information on a stationary Mean Field Game (MFG) and learn their policy through repeated experience. We adopt a high perspective on this problem and analyze in full generality the convergence of a fictitious iterative scheme using any single agent learning algorithm at each step. We quantify the quality of the computed approximate Nash equilibrium, in terms of the accumulated errors arising at each learning iteration step. Notably, we show for the first time convergence of model free learning algorithms towards non-stationary MFG equilibria, relying only on classical assumptions on the MFG dynamics. We illustrate our theoretical results with a numerical experiment in a continuous action-space environment, where the approximate best response of the iterative fictitious play scheme is computed with a deep RL algorithm.
△ Less
Submitted 20 February, 2020; v1 submitted 4 July, 2019;
originally announced July 2019.
-
The Flow of Information in Interactive Quantum Protocols: the Cost of Forgetting
Authors:
Mathieu Lauriere,
Dave Touchette
Abstract:
In the context of two-party interactive quantum communication protocols, we study a recently defined notion of quantum information cost (QIC), which possesses most of the important properties of its classical analogue. Although this definition has the advantage to be valid for fully quantum inputs and tasks, its interpretation for classical tasks remained rather obscure. Also, the link between thi…
▽ More
In the context of two-party interactive quantum communication protocols, we study a recently defined notion of quantum information cost (QIC), which possesses most of the important properties of its classical analogue. Although this definition has the advantage to be valid for fully quantum inputs and tasks, its interpretation for classical tasks remained rather obscure. Also, the link between this new notion and other notions of information cost for quantum protocols that had previously appeared in the literature was not clear, if existent at all.
We settle both these issues: for quantum communication with classical inputs, we provide an alternate characterization of QIC in terms of information about the input registers, avoiding any reference to the notion of a purification of the classical input state. We provide an exact operational interpretation of this alternative characterization as the sum of the cost of transmitting information about the classical inputs and the cost of forgetting information about these inputs. To obtain this characterization, we prove a general lemma, the Information Flow Lemma, assessing exactly the transfer of information in general interactive quantum processes. Furthermore, we clarify the link between QIC and IC of classical protocols by simulating quantumly classical protocols.
Finally, we apply these concepts to argue that any quantum protocol that does not forget information solves Disjointness on n-bits in Omega (n) communication, completely losing the quadratic quantum speedup. This provides a specific sense in which forgetting information is a necessary feature of interactive quantum protocols. We also apply these concepts to prove that QIC at zero-error is exactly n for the Inner Product function, and n (1 - o(1)) for a random Boolean function on n+n bits.
△ Less
Submitted 8 January, 2017;
originally announced January 2017.
-
Extended Learning Graphs for Triangle Finding
Authors:
Titouan Carette,
Mathieu Laurière,
Frédéric Magniez
Abstract:
We present new quantum algorithms for Triangle Finding improving its best previously known quantum query complexities for both dense and spare instances.For dense graphs on $n$ vertices, we get a query complexity of $O(n^{5/4})$ without any of the extra logarithmic factors present in the previous algorithm of Le Gall [FOCS'14]. For sparse graphs with $m\geq n^{5/4}$ edges, we get a query complexit…
▽ More
We present new quantum algorithms for Triangle Finding improving its best previously known quantum query complexities for both dense and spare instances.For dense graphs on $n$ vertices, we get a query complexity of $O(n^{5/4})$ without any of the extra logarithmic factors present in the previous algorithm of Le Gall [FOCS'14]. For sparse graphs with $m\geq n^{5/4}$ edges, we get a query complexity of $O(n^{11/12}m^{1/6}\sqrt{\log n})$, which is better than the one obtained by Le Gall and Nakajima [ISAAC'15] when $m \geq n^{3/2}$. We also obtain an algorithm with query complexity ${O}(n^{5/6}(m\log n)^{1/6}+d_2\sqrt{n})$ where $d_2$ is the variance of the degree distribution. Our algorithms are designed and analyzed in a new model of learning graphs that we call extended learning graphs. In addition, we present a framework in order to easily combine and analyze them. As a consequence we get much simpler algorithms and analyses than previous algorithms of Le Gall {\it et al} based on the MNRS quantum walk framework [SICOMP'11].
△ Less
Submitted 11 October, 2016; v1 submitted 25 September, 2016;
originally announced September 2016.
-
Robust Bell inequalities from communication complexity
Authors:
Sophie Laplante,
Mathieu Laurière,
Alexandre Nolin,
Jérémie Roland,
Gabriel Senno
Abstract:
The question of how large Bell inequality violations can be, for quantum distributions, has been the object of much work in the past several years. We say that a Bell inequality is normalized if its absolute value does not exceed 1 for any classical (i.e. local) distribution. Upper and (almost) tight lower bounds have been given for the quantum violation of these Bell inequalities in terms of numb…
▽ More
The question of how large Bell inequality violations can be, for quantum distributions, has been the object of much work in the past several years. We say that a Bell inequality is normalized if its absolute value does not exceed 1 for any classical (i.e. local) distribution. Upper and (almost) tight lower bounds have been given for the quantum violation of these Bell inequalities in terms of number of outputs of the distribution, number of inputs, and the dimension of the shared quantum states. In this work, we revisit normalized Bell inequalities together with another family: inefficiency-resistant Bell inequalities. To be inefficiency-resistant, the Bell value must not exceed 1 for any local distribution, including those that can abort. This makes the Bell inequality resistant to the detection loophole, while a normalized Bell inequality is resistant to general local noise. Both these families of Bell inequalities are closely related to communication complexity lower bounds. We show how to derive large violations from any gap between classical and quantum communication complexity, provided the lower bound on classical communication is proven using these lower bound techniques. This leads to inefficiency-resistant violations that can be exponential in the size of the inputs. Finally, we study resistance to noise and inefficiency for these Bell inequalities.
△ Less
Submitted 4 June, 2018; v1 submitted 30 June, 2016;
originally announced June 2016.
-
Privacy in Quantum Communication Complexity
Authors:
Iordanis Kerenidis,
Mathieu Laurière,
François Le Gall,
Mathys Rennela
Abstract:
In two-party quantum communication complexity, Alice and Bob receive some classical inputs and wish to compute some function that depends on both these inputs, while minimizing the communication. This model has found numerous applications in many areas of computer science. One question that has received a lot of attention recently is whether it is possible to perform such protocols in a private wa…
▽ More
In two-party quantum communication complexity, Alice and Bob receive some classical inputs and wish to compute some function that depends on both these inputs, while minimizing the communication. This model has found numerous applications in many areas of computer science. One question that has received a lot of attention recently is whether it is possible to perform such protocols in a private way. We show that defining privacy for quantum protocols is not so straightforward and it depends on whether we assume that the registers where Alice and Bob receive their classical inputs are in fact classical registers (and hence unentangled with the rest of the protocol) or quantum registers (and hence can be entangled with the rest of the protocol or the environment). We provide new quantum protocols for the Inner Product function and for Private Information Retrieval, and show that the privacy assuming classical input registers can be exponentially smaller than the privacy assuming quantum input registers. We also argue that the right notion of privacy of a communication protocol is the one assuming classical input registers, since otherwise the players can deviate considerably from the protocol.
△ Less
Submitted 30 September, 2014;
originally announced September 2014.