Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

Ramanauskas, Karolis; Şimşek, Özgür

Computer Science > Machine Learning

arXiv:2312.03762 (cs)

[Submitted on 5 Dec 2023]

Title:Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

Authors:Karolis Ramanauskas, Özgür Şimşek

View PDF HTML (experimental)

Abstract:We explore colour versus shape goal misgeneralization originally demonstrated by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an ambiguous choice, the agents seem to prefer generalization based on colour rather than shape. After training over 1,000 agents in a simplified version of the environment and evaluating them on over 10 million episodes, we conclude that the behaviour can be attributed to the agents learning to detect the goal object through a specific colour channel. This choice is arbitrary. Additionally, we show how, due to underspecification, the preferences can change when retraining the agents using exactly the same procedure except for using a different random seed for the training run. Finally, we demonstrate the existence of outliers in out-of-distribution behaviour based on training random seed alone.

Comments:	ATTRIB: Workshop on Attributing Model Behavior at Scale at NeurIPS 2023
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2312.03762 [cs.LG]
	(or arXiv:2312.03762v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.03762

Submission history

From: Karolis Ramanauskas [view email]
[v1] Tue, 5 Dec 2023 19:00:46 UTC (1,881 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2023-12

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A Case Study

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators