An Analysis of Switchback Designs in Reinforcement Learning

Wen, Qianglin; Shi, Chengchun; Yang, Ying; Tang, Niansheng; Zhu, Hongtu

Statistics > Machine Learning

arXiv:2403.17285v2 (stat)

This paper has been withdrawn by Chengchun Shi

[Submitted on 26 Mar 2024 (v1), revised 5 Oct 2024 (this version, v2), latest version 28 Aug 2025 (v7)]

Title:An Analysis of Switchback Designs in Reinforcement Learning

Authors:Qianglin Wen, Chengchun Shi, Ying Yang, Niansheng Tang, Hongtu Zhu

No PDF available, click to view other formats

Abstract:This paper offers a detailed investigation of switchback designs in A/B testing, which alternate between baseline and new policies over time. Our aim is to thoroughly evaluate the effects of these designs on the accuracy of their resulting average treatment effect (ATE) estimators. We propose a novel "weak signal analysis" framework, which substantially simplifies the calculations of the mean squared errors (MSEs) of these ATEs in Markov decision process environments. Our findings suggest that (i) when the majority of reward errors are positively correlated, the switchback design is more efficient than the alternating-day design which switches policies in a daily basis. Additionally, increasing the frequency of policy switches tends to reduce the MSE of the ATE estimator. (ii) When the errors are uncorrelated, however, all these designs become asymptotically equivalent. (iii) In cases where the majority of errors are negative correlated, the alternating-day design becomes the optimal choice. These insights are crucial, offering guidelines for practitioners on designing experiments in A/B testing. Our analysis accommodates a variety of policy value estimators, including model-based estimators, least squares temporal difference learning estimators, and double reinforcement learning estimators, thereby offering a comprehensive understanding of optimal design strategies for policy evaluation in reinforcement learning.

Comments:	We recently spotted some errors in our proof
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2403.17285 [stat.ML]
	(or arXiv:2403.17285v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2403.17285

Submission history

From: Chengchun Shi [view email]
[v1] Tue, 26 Mar 2024 00:25:32 UTC (8,253 KB)
[v2] Sat, 5 Oct 2024 04:24:18 UTC (1 KB) (withdrawn)
[v3] Thu, 29 May 2025 12:21:30 UTC (16,822 KB)
[v4] Thu, 19 Jun 2025 01:39:49 UTC (24,395 KB)
[v5] Fri, 11 Jul 2025 05:19:33 UTC (26,593 KB)
[v6] Sun, 20 Jul 2025 06:44:45 UTC (26,592 KB)
[v7] Thu, 28 Aug 2025 04:55:44 UTC (26,592 KB)

Statistics > Machine Learning

Title:An Analysis of Switchback Designs in Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:An Analysis of Switchback Designs in Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators