Self-Play Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

Bertrand, Quentin; Duque, Juan; Calvano, Emilio; Gidel, Gauthier

Computer Science > Computer Science and Game Theory

arXiv:2312.08484 (cs)

[Submitted on 13 Dec 2023 (v1), last revised 18 Jun 2025 (this version, v2)]

Title:Self-Play Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

Authors:Quentin Bertrand, Juan Duque, Emilio Calvano, Gauthier Gidel

View PDF HTML (experimental)

Abstract:A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner's dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated "always defect" policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.

Subjects:	Computer Science and Game Theory (cs.GT)
Cite as:	arXiv:2312.08484 [cs.GT]
	(or arXiv:2312.08484v2 [cs.GT] for this version)
	https://doi.org/10.48550/arXiv.2312.08484

Submission history

From: Quentin Bertrand [view email]
[v1] Wed, 13 Dec 2023 19:55:24 UTC (4,023 KB)
[v2] Wed, 18 Jun 2025 15:22:03 UTC (1,410 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.GT

< prev | next >

new | recent | 2023-12

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computer Science and Game Theory

Title:Self-Play Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Science and Game Theory

Title:Self-Play Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators