Reward Shaping for User Satisfaction in a REINFORCE Recommender

Christakopoulou, Konstantina; Xu, Can; Zhang, Sai; Badam, Sriraj; Potter, Trevor; Li, Daniel; Wan, Hao; Yi, Xinyang; Le, Ya; Berg, Chris; Dixon, Eric Bencomo; Chi, Ed H.; Chen, Minmin

Computer Science > Information Retrieval

arXiv:2209.15166 (cs)

[Submitted on 30 Sep 2022]

Title:Reward Shaping for User Satisfaction in a REINFORCE Recommender

Authors:Konstantina Christakopoulou, Can Xu, Sai Zhang, Sriraj Badam, Trevor Potter, Daniel Li, Hao Wan, Xinyang Yi, Ya Le, Chris Berg, Eric Bencomo Dixon, Ed H. Chi, Minmin Chen

View PDF

Abstract:How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explicitly asking users to rate their experience with consumed items can provide valuable orthogonal information to the engagement/interaction data, acting as a proxy to the underlying user satisfaction. For sparsity, i.e, only being able to observe how satisfied users are with a tiny fraction of user-item interactions, imputation models can be useful in predicting satisfaction level for all items users have consumed. For learning satisfying recommender policies, we postulate that reward shaping in RL recommender agents is powerful for driving satisfying user experiences. Putting everything together, we propose to jointly learn a policy network and a satisfaction imputation network: The role of the imputation network is to learn which actions are satisfying to the user; while the policy network, built on top of REINFORCE, decides which items to recommend, with the reward utilizing the imputed satisfaction. We use both offline analysis and live experiments in an industrial large-scale recommendation platform to demonstrate the promise of our approach for satisfying user experiences.

Comments:	Accepted in Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 38th International Conference on Machine Learning, 2021
Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2209.15166 [cs.IR]
	(or arXiv:2209.15166v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2209.15166

Submission history

From: Konstantina Christakopoulou [view email]
[v1] Fri, 30 Sep 2022 01:29:12 UTC (1,843 KB)

Computer Science > Information Retrieval

Title:Reward Shaping for User Satisfaction in a REINFORCE Recommender

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Reward Shaping for User Satisfaction in a REINFORCE Recommender

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators