Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Takahashi, Tatsuki; Maru, Chihiro; Shoji, Hiroko

Statistics > Machine Learning

arXiv:2506.00446 (stat)

[Submitted on 31 May 2025]

Title:Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Authors:Tatsuki Takahashi, Chihiro Maru, Hiroko Shoji

View PDF HTML (experimental)

Abstract:Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only logged bandit data from previous versions. To address the high variance issues associated with existing estimators, we introduce two new assumptions: no direct effect on rankings and user behavior model on ranking embedding spaces. We then propose the generalized marginalized inverse propensity score (GMIPS) estimator with statistically desirable properties compared to existing ones. Finally, we demonstrate that the GMIPS achieves the lowest MSE. Notably, among GMIPS variants, the marginalized reward interaction IPS (MRIPS) incorporates a doubly marginalized importance weight based on a cascade behavior assumption on ranking embeddings. MRIPS effectively balances the trade-off between bias and variance, even as the ranking action spaces increase and the above assumptions may not hold, as evidenced by our experiments.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2506.00446 [stat.ML]
	(or arXiv:2506.00446v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2506.00446

Submission history

From: Tatsuki Takahashi [view email]
[v1] Sat, 31 May 2025 07:58:53 UTC (2,988 KB)

Statistics > Machine Learning

Title:Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators