-
Stabilizing Temporal Difference Learning via Implicit Stochastic Recursion
Authors:
Hwanwoo Kim,
Panos Toulis,
Eric Laber
Abstract:
Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, TD procedures are generally sensitive to step size specification. A poor choice of step size can dramatically increase…
▽ More
Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, TD procedures are generally sensitive to step size specification. A poor choice of step size can dramatically increase variance and slow convergence in both on-policy and off-policy evaluation tasks. In practice, researchers use trial and error to identify stable step sizes, but these approaches tend to be ad hoc and inefficient. As an alternative, we propose implicit TD algorithms that reformulate TD updates into fixed point equations. Such updates are more stable and less sensitive to step size without sacrificing computational efficiency. Moreover, we derive asymptotic convergence guarantees and finite-time error bounds for our proposed implicit TD algorithms, which include implicit TD(0), TD($λ$), and TD with gradient correction (TDC). Our results show that implicit TD algorithms are applicable to a much broader range of step sizes, and thus provide a robust and versatile framework for policy evaluation and value approximation in modern RL tasks. We demonstrate these benefits empirically through extensive numerical examples spanning both on-policy and off-policy tasks.
△ Less
Submitted 22 June, 2025; v1 submitted 2 May, 2025;
originally announced May 2025.
-
Reinforcement Learning for Respondent-Driven Sampling
Authors:
Justin Weltz,
Angela Yoon,
Yichi Zhang,
Alexander Volfovsky,
Eric Laber
Abstract:
Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the s…
▽ More
Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the study. Thus, it does not make use of accumulating information on which incentives are effective and for whom. We propose a reinforcement learning (RL) based adaptive RDS study design in which the incentives are tailored over time to maximize cumulative utility during the study. We show that these designs are more efficient, cost-effective, and can generate new insights into the social structure of hidden populations. In addition, we develop methods for valid post-study inference which are non-trivial due to the adaptive sampling induced by RL as well as the complex dependencies among subjects due to latent (unobserved) social network structure. We provide asymptotic regret bounds and illustrate its finite sample behavior through a suite of simulation experiments.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Experimental Designs for Heteroskedastic Variance
Authors:
Justin Weltz,
Tanner Fiez,
Alexander Volfovsky,
Eric Laber,
Blake Mason,
Houssam Nassif,
Lalit Jain
Abstract:
Most linear experimental design problems assume homogeneous variance although heteroskedastic noise is present in many realistic settings. Let a learner have access to a finite set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$ that can be probed to receive noisy linear responses of the form $y=x^{\top}θ^{\ast}+η$. Here $θ^{\ast}\in \mathbb{R}^d$ is an unknown parameter vector, and $η$ i…
▽ More
Most linear experimental design problems assume homogeneous variance although heteroskedastic noise is present in many realistic settings. Let a learner have access to a finite set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$ that can be probed to receive noisy linear responses of the form $y=x^{\top}θ^{\ast}+η$. Here $θ^{\ast}\in \mathbb{R}^d$ is an unknown parameter vector, and $η$ is independent mean-zero $σ_x^2$-sub-Gaussian noise defined by a flexible heteroskedastic variance model, $σ_x^2 = x^{\top}Σ^{\ast}x$. Assuming that $Σ^{\ast}\in \mathbb{R}^{d\times d}$ is an unknown matrix, we propose, analyze and empirically evaluate a novel design for uniformly bounding estimation error of the variance parameters, $σ_x^2$. We demonstrate the benefits of this method with two adaptive experimental design problems under heteroskedastic noise, fixed confidence transductive best-arm identification and level-set identification and prove the first instance-dependent lower bounds in these settings. Lastly, we construct near-optimal algorithms and demonstrate the large improvements in sample complexity gained from accounting for heteroskedastic variance in these designs empirically.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Inference for change-plane regression
Authors:
Chaeryon Kang,
Hunyong Cho,
Rui Song,
Moulinath Banerjee,
Eric B. Laber,
Michael R. Kosorok
Abstract:
A key challenge in analyzing the behavior of change-plane estimators is that the objective function has multiple minimizers. Two estimators are proposed to deal with this non-uniqueness. For each estimator, an n-rate of convergence is established, and the limiting distribution is derived. Based on these results, we provide a parametric bootstrap procedure for inference. The validity of our theoret…
▽ More
A key challenge in analyzing the behavior of change-plane estimators is that the objective function has multiple minimizers. Two estimators are proposed to deal with this non-uniqueness. For each estimator, an n-rate of convergence is established, and the limiting distribution is derived. Based on these results, we provide a parametric bootstrap procedure for inference. The validity of our theoretical results and the finite sample performance of the bootstrap are demonstrated through simulation experiments. We illustrate the proposed methods to latent subgroup identification in precision medicine using the ACTG175 AIDS study data.
△ Less
Submitted 13 January, 2024; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Convergence Rates of Posterior Distributions in Markov Decision Process
Authors:
Zhen Li,
Eric Laber
Abstract:
In this paper, we show the convergence rates of posterior distributions of the model dynamics in a MDP for both episodic and continuous tasks. The theoretical results hold for general state and action space and the parameter space of the dynamics can be infinite dimensional. Moreover, we show the convergence rates of posterior distributions of the mean accumulative reward under a fixed or the opti…
▽ More
In this paper, we show the convergence rates of posterior distributions of the model dynamics in a MDP for both episodic and continuous tasks. The theoretical results hold for general state and action space and the parameter space of the dynamics can be infinite dimensional. Moreover, we show the convergence rates of posterior distributions of the mean accumulative reward under a fixed or the optimal policy and of the regret bound. A variant of Thompson sampling algorithm is proposed which provides both posterior convergence rates for the dynamics and the regret-type bound. Then the previous results are extended to Markov games. Finally, we show numerical results with three simulation scenarios and conclude with discussions.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
Sufficient Markov Decision Processes with Alternating Deep Neural Networks
Authors:
Longshaokan Wang,
Eric B. Laber,
Katie Witkiewitz
Abstract:
Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We pro…
▽ More
Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We propose a method for constructing a low-dimensional representation of the original decision process for which: 1. the MDP model holds; 2. a decision strategy that maximizes mean utility when applied to the low-dimensional representation also maximizes mean utility when applied to the original process. We use a deep neural network to define a class of potential process representations and estimate the process of lowest dimension within this class. The method is illustrated using data from a mobile study on heavy drinking and smoking among college students.
△ Less
Submitted 17 March, 2018; v1 submitted 25 April, 2017;
originally announced April 2017.