SR-Reward: Taking The Path More Traveled

Azad, Seyed Mahdi B.; Padar, Zahra; Kalweit, Gabriel; Boedecker, Joschka

Computer Science > Machine Learning

arXiv:2501.02330v2 (cs)

[Submitted on 4 Jan 2025 (v1), revised 29 Apr 2025 (this version, v2), latest version 12 Jun 2025 (v3)]

Title:SR-Reward: Taking The Path More Traveled

Authors:Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker

View PDF HTML (experimental)

Abstract:In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner's policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward function, called \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states' visitation under the demonstration policy and transition dynamics. By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline. We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing robustness. This strategy inherently introduces a conservative bias into RL algorithms that employ the learned reward. We evaluate our method on the D4RL benchmark, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning. Moreover, our ablation studies on data size and quality reveal the advantages and limitations of SR-Reward as a proxy for true rewards.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.02330 [cs.LG]
	(or arXiv:2501.02330v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.02330

Submission history

From: Seyed Mahdi Bairi Azad [view email]
[v1] Sat, 4 Jan 2025 16:21:10 UTC (1,730 KB)
[v2] Tue, 29 Apr 2025 13:02:32 UTC (4,982 KB)
[v3] Thu, 12 Jun 2025 13:18:51 UTC (2,227 KB)

Computer Science > Machine Learning

Title:SR-Reward: Taking The Path More Traveled

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SR-Reward: Taking The Path More Traveled

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators