SR-Reward: Taking The Path More Traveled
Authors:
Seyed Mahdi B. Azad,
Zahra Padar,
Gabriel Kalweit,
Joschka Boedecker
Abstract:
In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner's policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward function, calle…
▽ More
In this paper, we propose a novel method for learning reward functions directly from offline demonstrations. Unlike traditional inverse reinforcement learning (IRL), our approach decouples the reward function from the learner's policy, eliminating the adversarial interaction typically required between the two. This results in a more stable and efficient training process. Our reward function, called \textit{SR-Reward}, leverages successor representation (SR) to encode a state based on expected future states' visitation under the demonstration policy and transition dynamics. By utilizing the Bellman equation, SR-Reward can be learned concurrently with most reinforcement learning (RL) algorithms without altering the existing training pipeline. We also introduce a negative sampling strategy to mitigate overestimation errors by reducing rewards for out-of-distribution data, thereby enhancing robustness. This strategy inherently introduces a conservative bias into RL algorithms that employ the learned reward. We evaluate our method on the D4RL benchmark, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning. Moreover, our ablation studies on data size and quality reveal the advantages and limitations of SR-Reward as a proxy for true rewards.
△ Less
Submitted 12 June, 2025; v1 submitted 4 January, 2025;
originally announced January 2025.
Opinion Manipulation on Farsi Twitter
Authors:
Amirhossein Farzam,
Parham Moradi,
Saeedeh Mohammadi,
Zahra Padar,
Alexandra A. Siegel
Abstract:
For Iranians and the Iranian diaspora, the Farsi Twittersphere provides an important alternative to state media and an outlet for political discourse. But this understudied online space has become an opinion manipulation battleground, with diverse actors using inauthentic accounts to advance their goals and shape online narratives. Examining trending discussions crossing social cleavages in Iran,…
▽ More
For Iranians and the Iranian diaspora, the Farsi Twittersphere provides an important alternative to state media and an outlet for political discourse. But this understudied online space has become an opinion manipulation battleground, with diverse actors using inauthentic accounts to advance their goals and shape online narratives. Examining trending discussions crossing social cleavages in Iran, we explore how the dynamics of opinion manipulation differ across diverse issue areas. Our analysis suggests that opinion manipulation by inauthentic accounts is more prevalent in divisive political discussions than non-divisive or apolitical discussions. We show how Twitter's network structures help to reinforce the content propagated by clusters of inauthentic accounts in divisive political discussions. Analyzing both the content and structure of online discussions in the Iranian Twittersphere, this work contributes to a growing body of literature exploring the dynamics of online opinion manipulation, while improving our understanding of how information is controlled in the digital age.
△ Less
Submitted 8 March, 2023; v1 submitted 18 May, 2022;
originally announced May 2022.