Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Carmona, René; Laurière, Mathieu; Tan, Zongjun

Mathematics > Optimization and Control

arXiv:1910.12802 (math)

[Submitted on 28 Oct 2019 (v1), last revised 13 Oct 2021 (this version, v2)]

Title:Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Authors:René Carmona, Mathieu Laurière, Zongjun Tan

View PDF

Abstract:We study infinite horizon discounted Mean Field Control (MFC) problems with common noise through the lens of Mean Field Markov Decision Processes (MFMDP). We allow the agents to use actions that are randomized not only at the individual level but also at the level of the population. This common randomization allows us to establish connections between both closed-loop and open-loop policies for MFC and Markov policies for the MFMDP. In particular, we show that there exists an optimal closed-loop policy for the original MFC. Building on this framework and the notion of state-action value function, we then propose reinforcement learning (RL) methods for such problems, by adapting existing tabular and deep RL methods to the mean-field setting. The main difficulty is the treatment of the population state, which is an input of the policy and the value function. We provide convergence guarantees for tabular algorithms based on discretizations of the simplex. Neural network based algorithms are more suitable for continuous spaces and allow us to avoid discretizing the mean field state space. Numerical examples are provided.

Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG)
Cite as:	arXiv:1910.12802 [math.OC]
	(or arXiv:1910.12802v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.1910.12802

Submission history

From: Mathieu Laurière [view email]
[v1] Mon, 28 Oct 2019 16:56:46 UTC (103 KB)
[v2] Wed, 13 Oct 2021 17:57:21 UTC (3,149 KB)

Mathematics > Optimization and Control

Title:Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators