-
Scalable Policy Maximization Under Network Interference
Authors:
Aidan Gleich,
Eric Laber,
Alexander Volfovsky
Abstract:
Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings. However, standard independence assumptions fail when the treatment status of one individual impacts the outcomes of others, a phenomenon known as interference. We stud…
▽ More
Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings. However, standard independence assumptions fail when the treatment status of one individual impacts the outcomes of others, a phenomenon known as interference. We study optimal-policy learning under interference on a dynamic network. Existing approaches to this problem require repeated observations of the same fixed network and struggle to scale in sample size beyond as few as fifteen connected units -- both limit applications. We show that under common assumptions on the structure of interference, rewards become linear. This enables us to develop a scalable Thompson sampling algorithm that maximizes policy impact when a new $n$-node network is observed each round. We prove a Bayesian regret bound that is sublinear in $n$ and the number of rounds. Simulation experiments show that our algorithm learns quickly and outperforms existing methods. The results close a key scalability gap between causal inference methods for interference and practical bandit algorithms, enabling policy optimization in large-scale networked systems.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Stabilizing Temporal Difference Learning via Implicit Stochastic Approximation
Authors:
Hwanwoo Kim,
Panos Toulis,
Eric Laber
Abstract:
Temporal Difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, it is not without drawbacks, the most prominent being its sensitivity to step size. A poor choice of step size can dram…
▽ More
Temporal Difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized algorithms. However, despite its widespread use, it is not without drawbacks, the most prominent being its sensitivity to step size. A poor choice of step size can dramatically inflate the error of value estimates and slow convergence. Consequently, in practice, researchers must use trial and error in order to identify a suitable step size -- a process that can be tedious and time consuming. As an alternative, we propose implicit TD algorithms that reformulate TD updates into fixed-point equations. These updates are more stable and less sensitive to step size without sacrificing computational efficiency. Moreover, our theoretical analysis establishes asymptotic convergence guarantees and finite-time error bounds. Our results demonstrate their robustness and practicality for modern RL tasks, establishing implicit TD as a versatile tool for policy evaluation and value approximation.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization
Authors:
Kevin Li,
Eric Laber
Abstract:
The contextual bandit framework is widely used to solve sequential optimization problems where the reward of each decision depends on auxiliary context variables. In settings such as medicine, business, and engineering, the decision maker often possesses additional structural information on the generative model that can potentially be used to improve the efficiency of bandit algorithms. We conside…
▽ More
The contextual bandit framework is widely used to solve sequential optimization problems where the reward of each decision depends on auxiliary context variables. In settings such as medicine, business, and engineering, the decision maker often possesses additional structural information on the generative model that can potentially be used to improve the efficiency of bandit algorithms. We consider settings in which the mean reward is known to be a concave function of the action for each fixed context. Examples include patient-specific dose-response curves in medicine and expected profit in online advertising auctions. We propose a contextual bandit algorithm that accelerates optimization by conditioning the posterior of a Bayesian Gaussian Process model on this concavity information. We design a novel shape-constrained reward function estimator using a specially chosen regression spline basis and constrained Gaussian Process posterior. Using this model, we propose a UCB algorithm and derive corresponding regret bounds. We evaluate our algorithm on numerical examples and test functions used to study optimal dosing of Anti-Clotting medication.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits
Authors:
Piotr M. Suder,
Eric Laber
Abstract:
Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one have prior knowledge of a (relatively) tight upper bound on the norm of the true parameter vector governing the reward model in order to achieve good performan…
▽ More
Information-directed sampling (IDS) is a powerful framework for solving bandit problems which has shown strong results in both Bayesian and frequentist settings. However, frequentist IDS, like many other bandit algorithms, requires that one have prior knowledge of a (relatively) tight upper bound on the norm of the true parameter vector governing the reward model in order to achieve good performance. Unfortunately, this requirement is rarely satisfied in practice. As we demonstrate, using a poorly calibrated bound can lead to significant regret accumulation. To address this issue, we introduce a novel frequentist IDS algorithm that iteratively refines a high-probability upper bound on the true parameter norm using accumulating data. We focus on the linear bandit setting with heteroskedastic subgaussian noise. Our method leverages a mixture of relevant information gain criteria to balance exploration aimed at tightening the estimated parameter norm bound and directly searching for the optimal action. We establish regret bounds for our algorithm that do not depend on an initially assumed parameter norm bound and demonstrate that our method outperforms state-of-the-art IDS and UCB algorithms.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Reinforcement Learning for Respondent-Driven Sampling
Authors:
Justin Weltz,
Angela Yoon,
Yichi Zhang,
Alexander Volfovsky,
Eric Laber
Abstract:
Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the s…
▽ More
Respondent-driven sampling (RDS) is widely used to study hidden or hard-to-reach populations by incentivizing study participants to recruit their social connections. The success and efficiency of RDS can depend critically on the nature of the incentives, including their number, value, call to action, etc. Standard RDS uses an incentive structure that is set a priori and held fixed throughout the study. Thus, it does not make use of accumulating information on which incentives are effective and for whom. We propose a reinforcement learning (RL) based adaptive RDS study design in which the incentives are tailored over time to maximize cumulative utility during the study. We show that these designs are more efficient, cost-effective, and can generate new insights into the social structure of hidden populations. In addition, we develop methods for valid post-study inference which are non-trivial due to the adaptive sampling induced by RL as well as the complex dependencies among subjects due to latent (unobserved) social network structure. We provide asymptotic regret bounds and illustrate its finite sample behavior through a suite of simulation experiments.
△ Less
Submitted 2 January, 2025;
originally announced January 2025.
-
Functional Principal Component Analysis for Truncated Data
Authors:
Caitrin Murphy,
Eric Laber,
Rhonda Merwin,
Brian Reich
Abstract:
Functional principal component analysis (FPCA) is a key tool in the study of functional data, driving both exploratory analyses and feature construction for use in formal modeling and testing procedures. However, existing methods for FPCA do not apply when functional observations are truncated, e.g., the measurement instrument only supports recordings within a pre-specified interval, thereby trunc…
▽ More
Functional principal component analysis (FPCA) is a key tool in the study of functional data, driving both exploratory analyses and feature construction for use in formal modeling and testing procedures. However, existing methods for FPCA do not apply when functional observations are truncated, e.g., the measurement instrument only supports recordings within a pre-specified interval, thereby truncating values outside of the range to the nearest boundary. A naive application of existing methods without correction for truncation induces bias. We extend the FPCA framework to accommodate truncated noisy functional data by first recovering smooth mean and covariance surface estimates that are representative of the latent process's mean and covariance functions. Unlike traditional sample covariance smoothing techniques, our procedure yields a positive semi-definite covariance surface, computed without the need to retroactively remove negative eigenvalues in the covariance operator decomposition. Additionally, we construct a FPC score predictor and demonstrate its use in the generalized functional linear model. Convergence rates for the proposed estimators are provided. In simulation experiments, the proposed method yields better predictive performance and lower bias than existing alternatives. We illustrate its practical value through an application to a study with truncated blood glucose measurements.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Optimal treatment strategies for prioritized outcomes
Authors:
Kyle Duke,
Eric B. Laber,
Marie Davidian,
Michael Newcomb,
Brian Mustanksi
Abstract:
Dynamic treatment regimes formalize precision medicine as a sequence of decision rules, one for each stage of clinical intervention, that map current patient information to a recommended intervention. Optimal regimes are typically defined as maximizing some functional of a scalar outcome's distribution, e.g., the distribution's mean or median. However, in many clinical applications, there are mult…
▽ More
Dynamic treatment regimes formalize precision medicine as a sequence of decision rules, one for each stage of clinical intervention, that map current patient information to a recommended intervention. Optimal regimes are typically defined as maximizing some functional of a scalar outcome's distribution, e.g., the distribution's mean or median. However, in many clinical applications, there are multiple outcomes of interest. We consider the problem of estimating an optimal regime when there are multiple outcomes that are ordered by priority but which cannot be readily combined by domain experts into a meaningful single scalar outcome. We propose a definition of optimality in this setting and show that an optimal regime with respect to this definition leads to maximal mean utility under a large class of utility functions. Furthermore, we use inverse reinforcement learning to identify a composite outcome that most closely aligns with our definition within a pre-specified class. Simulation experiments and an application to data from a sequential multiple assignment randomized trial (SMART) on HIV/STI prevention illustrate the usefulness of the proposed approach.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
New bounds on the cohesion of complete-link and other linkage methods for agglomeration clustering
Authors:
Sanjoy Dasgupta,
Eduardo Laber
Abstract:
Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces.
One of our new bounds, in contrast to the existing ones,…
▽ More
Linkage methods are among the most popular algorithms for hierarchical clustering. Despite their relevance the current knowledge regarding the quality of the clustering produced by these methods is limited. Here, we improve the currently available bounds on the maximum diameter of the clustering obtained by complete-link for metric spaces.
One of our new bounds, in contrast to the existing ones, allows us to separate complete-link from single-link in terms of approximation for the diameter, which corroborates the common perception that the former is more suitable than the latter when the goal is producing compact clusters.
We also show that our techniques can be employed to derive upper bounds on the cohesion of a class of linkage methods that includes the quite popular average-link.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Adaptive Randomization Methods for Sequential Multiple Assignment Randomized Trials (SMARTs) via Thompson Sampling
Authors:
Peter Norwood,
Marie Davidian,
Eric Laber
Abstract:
Response-adaptive randomization (RAR) has been studied extensively in conventional, single-stage clinical trials, where it has been shown to yield ethical and statistical benefits, especially in trials with many treatment arms. However, RAR and its potential benefits are understudied in sequential multiple assignment randomized trials (SMARTs), which are the gold-standard trial design for evaluati…
▽ More
Response-adaptive randomization (RAR) has been studied extensively in conventional, single-stage clinical trials, where it has been shown to yield ethical and statistical benefits, especially in trials with many treatment arms. However, RAR and its potential benefits are understudied in sequential multiple assignment randomized trials (SMARTs), which are the gold-standard trial design for evaluation of multi-stage treatment regimes. We propose a suite of RAR algorithms for SMARTs based on Thompson Sampling (TS), a widely used RAR method in single-stage trials in which treatment randomization probabilities are aligned with the estimated probability that the treatment is optimal. We focus on two common objectives in SMARTs: (i) comparison of the regimes embedded in the trial, and (ii) estimation of an optimal embedded regime. We develop valid post-study inferential procedures for treatment regimes under the proposed algorithms. This is nontrivial, as (even in single-stage settings) RAR can lead to nonnormal limiting distributions of estimators. Our algorithms are the first for RAR in multi-stage trials that account for nonregularity in the estimand. Empirical studies based on real-world SMARTs show that TS can improve in-trial subject outcomes without sacrificing efficiency for post-trial comparisons.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Sequential Knockoffs for Variable Selection in Reinforcement Learning
Authors:
Tao Ma,
Jin Zhu,
Hengrui Cai,
Zhengling Qi,
Yunxiao Chen,
Chengchun Shi,
Eric B. Laber
Abstract:
In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state may sl…
▽ More
In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state may slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the subvector of the original state under which the process remains an MDP and shares the same reward function as the original process. We propose a novel SEquEntial Knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high-dimensional complex nonlinear dynamics. In large samples, the proposed method achieves selection consistency. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy learning. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods regarding variable selection accuracy and regret.
△ Less
Submitted 30 July, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Interim Monitoring of Sequential Multiple Assignment Randomized Trials Using Partial Information
Authors:
Cole Manschot,
Eric Laber,
Marie Davidian
Abstract:
The sequential multiple assignment randomized trial (SMART) is the gold standard trial design to generate data for the evaluation of multi-stage treatment regimes. As with conventional (single-stage) randomized clinical trials, interim monitoring allows early stopping; however, there are few methods for principled interim analysis in SMARTs. Because SMARTs involve multiple stages of treatment, a k…
▽ More
The sequential multiple assignment randomized trial (SMART) is the gold standard trial design to generate data for the evaluation of multi-stage treatment regimes. As with conventional (single-stage) randomized clinical trials, interim monitoring allows early stopping; however, there are few methods for principled interim analysis in SMARTs. Because SMARTs involve multiple stages of treatment, a key challenge is that not all enrolled participants will have progressed through all treatment stages at the time of an interim analysis. Wu et al. (2021) propose basing interim analyses on an estimator for the mean outcome under a given regime that uses data only from participants who have completed all treatment stages. We propose an estimator for the mean outcome under a given regime that gains efficiency by using partial information from enrolled participants regardless of their progression through treatment stages. Using the asymptotic distribution of this estimator, we derive associated Pocock and O'Brien-Fleming testing procedures for early stopping. In simulation experiments, the estimator controls type I error and achieves nominal power while reducing expected sample size relative to the method of Wu et al. (2021). We present an illustrative application of the proposed estimator based on a recent SMART evaluating behavioral pain interventions for breast cancer patients.
△ Less
Submitted 15 February, 2023; v1 submitted 13 September, 2022;
originally announced September 2022.
-
Variance decompositions for extensive-form games
Authors:
Alex Cloud,
Eric Laber
Abstract:
Quantitative measures of randomness in games are useful for game design and have implications for gambling law. We treat the outcome of a game as a random variable and derive a closed-form expression and estimator for the variance in the outcome attributable to a player of the game. We analyze poker hands to show that randomness in the cards dealt has little influence on the outcomes of each hand.…
▽ More
Quantitative measures of randomness in games are useful for game design and have implications for gambling law. We treat the outcome of a game as a random variable and derive a closed-form expression and estimator for the variance in the outcome attributable to a player of the game. We analyze poker hands to show that randomness in the cards dealt has little influence on the outcomes of each hand. A simple example is given to demonstrate how variance decompositions can be used to measure other interesting properties of games.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Latent-state models for precision medicine
Authors:
Zekun Xu,
Eric Laber,
Ana-Maria Staicu,
Emanuel Severus
Abstract:
Observational longitudinal studies are a common means to study treatment efficacy and safety in chronic mental illness. In many such studies, treatment changes may be initiated by either the patient or by their clinician and can thus vary widely across patients in their timing, number, and type. Indeed, in the observational longitudinal pathway of the STEP-BD study of bipolar depression, one of th…
▽ More
Observational longitudinal studies are a common means to study treatment efficacy and safety in chronic mental illness. In many such studies, treatment changes may be initiated by either the patient or by their clinician and can thus vary widely across patients in their timing, number, and type. Indeed, in the observational longitudinal pathway of the STEP-BD study of bipolar depression, one of the motivations for this work, no two patients have the same treatment history even after coarsening clinic visits to a weekly time-scale. Estimation of an optimal treatment regime using such data is challenging as one cannot naively pool together patients with the same treatment history, as is required by methods based on inverse probability weighting, nor is it possible to apply backwards induction over the decision points, as is done in Q-learning and its variants. Thus, additional structure is needed to effectively pool information across patients and within a patient over time. Current scientific theory for many chronic mental illnesses maintains that a patient's disease status can be conceptualized as transitioning among a small number of discrete states. We use this theory to inform the construction of a partially observable Markov decision process model of patient health trajectories wherein observed health outcomes are dictated by a patient's latent health state. Using this model, we derive and evaluate estimators of an optimal treatment regime under two common paradigms for quantifying long-term patient health. The finite sample performance of the proposed estimator is demonstrated through a series of simulation experiments and application to the observational pathway of the STEP-BD study. We find that the proposed method provides high-quality estimates of an optimal treatment strategy in settings where existing approaches cannot be applied without ad hoc modifications.
△ Less
Submitted 11 June, 2020; v1 submitted 26 May, 2020;
originally announced May 2020.
-
A spatiotemporal recommendation engine for malaria control
Authors:
Qian Guan,
Brian J. Reich,
Eric B. Laber
Abstract:
Malaria is an infectious disease affecting a large population across the world, and interventions need to be efficiently applied to reduce the burden of malaria. We develop a framework to help policy-makers decide how to allocate limited resources in realtime for malaria control. We formalize a policy for the resource allocation as a sequence of decisions, one per intervention decision, that map u…
▽ More
Malaria is an infectious disease affecting a large population across the world, and interventions need to be efficiently applied to reduce the burden of malaria. We develop a framework to help policy-makers decide how to allocate limited resources in realtime for malaria control. We formalize a policy for the resource allocation as a sequence of decisions, one per intervention decision, that map up-to-date disease related information to a resource allocation. An optimal policy must control the spread of the disease while being interpretable and viewed as equitable to stakeholders. We construct an interpretable class of resource allocation policies that can accommodate allocation of resources residing in a continuous domain, and combine a hierarchical Bayesian spatiotemporal model for disease transmission with a policy-search algorithm to estimate an optimal policy for resource allocation within the pre-specified class. The estimated optimal policy under the proposed framework improves the cumulative long-term outcome compared with naive approaches in both simulation experiments and application to malaria interventions in the Democratic Republic of the Congo.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
High dimensional precision medicine from patient-derived xenografts
Authors:
Naim U. Rashid,
Daniel J. Luckett,
Jingxiang Chen,
Michael T. Lawson,
Longshaokan Wang,
Yunshu Zhang,
Eric B. Laber,
Yufeng Liu,
Jen Jen Yeh,
Donglin Zeng,
Michael R. Kosorok
Abstract:
The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a…
▽ More
The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a clinical outcome in a population of interest. Patient-derived xenograft (PDX) studies permit the evaluation of multiple treatments within a single tumor and thus are ideally suited for estimating optimal ITRs. PDX data are characterized by correlated outcomes, a high-dimensional feature space, and a large number of treatments. Existing methods for estimating optimal ITRs do not take advantage of the unique structure of PDX data or handle the associated challenges well. In this paper, we explore machine learning methods for estimating optimal ITRs from PDX data. We analyze data from a large PDX study to identify biomarkers that are informative for developing personalized treatment recommendations in multiple cancers. We estimate optimal ITRs using regression-based approaches such as Q-learning and direct search methods such as outcome weighted learning. Finally, we implement a superlearner approach to combine a set of estimated ITRs and show that the resulting ITR performs better than any of the input ITRs, mitigating uncertainty regarding user choice of any particular ITR estimation methodology. Our results indicate that PDX data are a valuable resource for developing individualized treatment strategies in oncology.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Sex Trafficking Detection with Ordinal Regression Neural Networks
Authors:
Longshaokan Wang,
Eric Laber,
Yeng Saanchi,
Sherrie Caltagirone
Abstract:
Sex trafficking is a global epidemic. Escort websites are a primary vehicle for selling the services of such trafficking victims and thus a major driver of trafficker revenue. Many law enforcement agencies do not have the resources to manually identify leads from the millions of escort ads posted across dozens of public websites. We propose an ordinal regression neural network to identify escort a…
▽ More
Sex trafficking is a global epidemic. Escort websites are a primary vehicle for selling the services of such trafficking victims and thus a major driver of trafficker revenue. Many law enforcement agencies do not have the resources to manually identify leads from the millions of escort ads posted across dozens of public websites. We propose an ordinal regression neural network to identify escort ads that are likely linked to sex trafficking. Our model uses a modified cost function to mitigate inconsistencies in predictions often associated with nonparametric ordinal regression and leverages recent advancements in deep learning to improve prediction accuracy. The proposed method significantly improves on the previous state-of-the-art on Trafficking-10K, an expert-annotated dataset of escort ads. Additionally, because traffickers use acronyms, deliberate typographical errors, and emojis to replace explicit keywords, we demonstrate how to expand the lexicon of trafficking flags through word embeddings and t-SNE.
△ Less
Submitted 11 January, 2020; v1 submitted 15 August, 2019;
originally announced August 2019.
-
Parameterized Exploration
Authors:
Jesse Clifton,
Lili Wu,
Eric Laber
Abstract:
We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems. Unlike common heuristics for exploration, our method accounts for the time horizon of the decision problem as well as the agent's current state of knowledge of the dynamics of the decision problem. We show our method as applied to several commo…
▽ More
We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems. Unlike common heuristics for exploration, our method accounts for the time horizon of the decision problem as well as the agent's current state of knowledge of the dynamics of the decision problem. We show our method as applied to several common exploration techniques has superior performance relative to un-tuned counterparts in Bernoulli and Gaussian multi-armed bandits, contextual bandits, and a Markov decision process based on a mobile health (mHealth) study. We also examine the effects of the accuracy of the estimated dynamics model on the performance of PE.
△ Less
Submitted 13 July, 2019;
originally announced July 2019.
-
Sample Size Calculations for SMARTs
Authors:
Eric J. Rose,
Eric B. Laber,
Marie Davidian,
Anastasios A. Tsiatis,
Ying-Qi Zhao,
Michael R. Kosorok
Abstract:
Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with forma…
▽ More
Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with formal evaluation of the estimated optimal regime deferred to a follow-up trial. However, running a follow-up trial to evaluate an estimated optimal treatment regime is costly and time-consuming; furthermore, the estimated optimal regime that is to be evaluated in such a follow-up trial may be far from optimal if the original trial was underpowered for estimation of an optimal regime. We derive sample size procedures for a SMART that ensure: (i) sufficient power for comparing the optimal treatment regime with standard of care; and (ii) the estimated optimal regime is within a given tolerance of the true optimal regime with high-probability. We establish asymptotic validity of the proposed procedures and demonstrate their finite sample performance in a series of simulation experiments.
△ Less
Submitted 16 June, 2019;
originally announced June 2019.
-
Global forensic geolocation with deep neural networks
Authors:
Neal S. Grantham,
Brian J. Reich,
Eric B. Laber,
Krishna Pacifici,
Robert R. Dunn,
Noah Fierer,
Matthew Gebert,
Julia S. Allwood,
Seth A. Faith
Abstract:
An important problem in forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application-specific, labor-intensive, and subjective. Purely data-driven methods have yet to be fully realiz…
▽ More
An important problem in forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application-specific, labor-intensive, and subjective. Purely data-driven methods have yet to be fully realized due in part to the lack of a sufficiently rich data source. However, high-throughput sequencing technologies are able to identify tens of thousands of microbial taxa using DNA recovered from a single swab collected from nearly any object or surface. We present a new algorithm for geolocation that aggregates over an ensemble of deep neural network classifiers trained on randomly-generated Voronoi partitions of a spatial domain. We apply the algorithm to fungi present in each of 1300 dust samples collected across the continental United States and then to a global dataset of dust samples from 28 countries. Our algorithm makes remarkably good point predictions with more than half of the geolocation errors under 100 kilometers for the continental analysis and nearly 90% classification accuracy of a sample's country of origin for the global analysis. We suggest that the effectiveness of this model sets the stage for a new, quantitative approach to forensic geolocation.
△ Less
Submitted 28 May, 2019;
originally announced May 2019.
-
Note on Thompson sampling for large decision problems
Authors:
Tao Hu,
Eric B. Laber,
Zhen Li,
Nick J. Meyer,
Krishna Pacifici
Abstract:
There is increasing interest in using streaming data to inform decision making across a wide range of application domains including mobile health, food safety, security, and resource management. A decision support system formalizes online decision making as a map from up-to-date information to a recommended decision. Online estimation of an optimal decision strategy from streaming data requires si…
▽ More
There is increasing interest in using streaming data to inform decision making across a wide range of application domains including mobile health, food safety, security, and resource management. A decision support system formalizes online decision making as a map from up-to-date information to a recommended decision. Online estimation of an optimal decision strategy from streaming data requires simultaneous estimation of components of the underlying system dynamics as well as the optimal decision strategy given these dynamics; thus, there is an inherent trade-off between choosing decisions that lead to improved estimates and choosing decisions that appear to be optimal based on current estimates. Thompson (1933) was among the first to formalize this trade-off in the context of choosing between two treatments for a stream of patients; he proposed a simple heuristic wherein a treatment is selected randomly at each time point with selection probability proportional to the posterior probability that it is optimal. We consider a variant of Thompson sampling that is simple to implement and can be applied to large and complex decision problems. We show that the proposed Thompson sampling estimator is consistent for the optimal decision support system and provide rates of convergence and finite sample error bounds. The proposed algorithm is illustrated using an agent-based model of the spread of influenza on a network and management of mallard populations in the United States.
△ Less
Submitted 12 May, 2019;
originally announced May 2019.
-
Efficient augmentation and relaxation learning for individualized treatment rules using observational data
Authors:
Ying-Qi Zhao,
Eric B. Laber,
Yang Ning,
Sumona Saha,
Bruce Sands
Abstract:
Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individual…
▽ More
Individualized treatment rules aim to identify if, when, which, and to whom treatment should be applied. A globally aging population, rising healthcare costs, and increased access to patient-level data have created an urgent need for high-quality estimators of individualized treatment rules that can be applied to observational data. A recent and promising line of research for estimating individualized treatment rules recasts the problem of estimating an optimal treatment rule as a weighted classification problem. We consider a class of estimators for optimal treatment rules that are analogous to convex large-margin classifiers. The proposed class applies to observational data and is doubly-robust in the sense that correct specification of either a propensity or outcome model leads to consistent estimation of the optimal individualized treatment rule. Using techniques from semiparametric efficiency theory, we derive rates of convergence for the proposed estimators and use these rates to characterize the bias-variance trade-off for estimating individualized treatment rules with classification-based methods. Simulation experiments informed by these results demonstrate that it is possible to construct new estimators within the proposed framework that significantly outperform existing ones. We illustrate the proposed methods using data from a labor training program and a study of inflammatory bowel syndrome.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Generalization error for decision problems
Authors:
Eric B. Laber,
Min Qian
Abstract:
In this entry we review the generalization error for classification and single-stage decision problems. We distinguish three alternative definitions of the generalization error which have, at times, been conflated in the statistics literature and show that these definitions need not be equivalent even asymptotically. Because the generalization error is a non-smooth functional of the underlying gen…
▽ More
In this entry we review the generalization error for classification and single-stage decision problems. We distinguish three alternative definitions of the generalization error which have, at times, been conflated in the statistics literature and show that these definitions need not be equivalent even asymptotically. Because the generalization error is a non-smooth functional of the underlying generative model, standard asymptotic approximations, e.g., the bootstrap or normal approximations, cannot guarantee correct frequentist operating characteristics without modification. We provide simple data-adaptive procedures that can be used to construct asymptotically valid confidence sets for the generalization error. We conclude the entry with a discussion of extensions and related problems.
△ Less
Submitted 20 December, 2018;
originally announced December 2018.
-
Hierarchical Continuous Time Hidden Markov Model, with Application in Zero-Inflated Accelerometer Data
Authors:
Zekun Xu,
Eric B. Laber,
Ana-Maria Staicu
Abstract:
Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ. There is tremendous potential to use such data to inform medical decision making and public health policies. However, modeling such data is challenging as they are high-dimensional, heterogeneous, and subject to informative missingness, e.g., zero readings when the device is…
▽ More
Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ. There is tremendous potential to use such data to inform medical decision making and public health policies. However, modeling such data is challenging as they are high-dimensional, heterogeneous, and subject to informative missingness, e.g., zero readings when the device is removed by the participant. We propose a flexible and extensible continuous-time hidden Markov model to extract meaningful activity patterns from human accelerometer data. To facilitate estimation with massive data we derive an efficient learning algorithm that exploits the hierarchical structure of the parameters indexing the proposed model. We also propose a bootstrap procedure for interval estimation. The proposed methods are illustrated using data from the 2003 - 2004 and 2005 - 2006 National Health and Nutrition Examination Survey.
△ Less
Submitted 25 December, 2018; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Thompson Sampling for Pursuit-Evasion Problems
Authors:
Zhen Li,
Nicholas J. Meyer,
Eric B. Laber,
Robert Brigantic
Abstract:
Pursuit-evasion is a multi-agent sequential decision problem wherein a group of agents known as pursuers coordinate their traversal of a spatial domain to locate an agent trying to evade them. Pursuit evasion problems arise in a number of import application domains including defense and route planning. Learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evad…
▽ More
Pursuit-evasion is a multi-agent sequential decision problem wherein a group of agents known as pursuers coordinate their traversal of a spatial domain to locate an agent trying to evade them. Pursuit evasion problems arise in a number of import application domains including defense and route planning. Learning to optimally coordinate pursuer behaviors so as to minimize time to capture of the evader is challenging because of a large action space and sparse noisy state information; consequently, previous approaches have relied primarily on heuristics. We propose a variant of Thompson Sampling for pursuit-evasion that allows for the application of existing model-based planning algorithms. This approach is general in that it allows for an arbitrary number of pursuers, a general spatial domain, and the integration of auxiliary information provided by informants. In a suite of simulation experiments, Thompson Sampling for pursuit evasion significantly reduces time-to-capture relative to competing algorithms.
△ Less
Submitted 11 November, 2018;
originally announced November 2018.
-
Bayesian Nonparametric Policy Search with Application to Periodontal Recall Intervals
Authors:
Qian Guan,
Brian J. Reich,
Eric B. Laber,
Dipankar Bandyopadhyay
Abstract:
Tooth loss from periodontal disease is a major public health burden in the United States. Standard clinical practice is to recommend a dental visit every six months; however, this practice is not evidence-based, and poor dental outcomes and increasing dental insurance premiums indicate room for improvement. We consider a tailored approach that recommends recall time based on patient characteristic…
▽ More
Tooth loss from periodontal disease is a major public health burden in the United States. Standard clinical practice is to recommend a dental visit every six months; however, this practice is not evidence-based, and poor dental outcomes and increasing dental insurance premiums indicate room for improvement. We consider a tailored approach that recommends recall time based on patient characteristics and medical history to minimize disease progression without increasing resource expenditures. We formalize this method as a dynamic treatment regime which comprises a sequence of decisions, one per stage of intervention, that follow a decision rule which maps current patient information to a recommendation for their next visit time. The dynamics of periodontal health, visit frequency, and patient compliance are complex, yet the estimated optimal regime must be interpretable to domain experts if it is to be integrated into clinical practice. We combine non-parametric Bayesian dynamics modeling with policy-search algorithms to estimate the optimal dynamic treatment regime within an interpretable class of regimes. Both simulation experiments and application to a rich database of electronic dental records from the HealthPartners HMO shows that our proposed method leads to better dental health without increasing the average recommended recall time relative to competing methods.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
Receiver Operating Characteristic Curves and Confidence Bands for Support Vector Machines
Authors:
Daniel J. Luckett,
Eric B. Laber,
Samer S. El-Kamary,
Cheng Fan,
Ravi Jhaveri,
Charles M. Perou,
Fatma M. Shebl,
Michael R. Kosorok
Abstract:
Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve,…
▽ More
Many problems that appear in biomedical decision making, such as diagnosing disease and predicting response to treatment, can be expressed as binary classification problems. The costs of false positives and false negatives vary across application domains and receiver operating characteristic (ROC) curves provide a visual representation of this trade-off. Nonparametric estimators for the ROC curve, such as a weighted support vector machine (SVM), are desirable because they are robust to model misspecification. While weighted SVMs have great potential for estimating ROC curves, their theoretical properties were heretofore underdeveloped. We propose a method for constructing confidence bands for the SVM ROC curve and provide the theoretical justification for the SVM ROC curve by showing that the risk function of the estimated decision rule is uniformly consistent across the weight parameter. We demonstrate the proposed confidence band method and the superior sensitivity and specificity of the weighted SVM compared to commonly used methods in diagnostic medicine using simulation studies. We present two illustrative examples: diagnosis of hepatitis C and a predictive model for treatment response in breast cancer.
△ Less
Submitted 17 July, 2018;
originally announced July 2018.
-
Variable selection using pseudo-variables
Authors:
Wenhao Hu,
Eric Laber,
Leonard Stefanski
Abstract:
Penalized regression has become a standard tool for model building across a wide range of application domains. Common practice is to tune the amount of penalization to tradeoff bias and variance or to optimize some other measure of performance of the estimated model. An advantage of such automated model-building procedures is that their operating characteristics are well-defined, i.e., completely…
▽ More
Penalized regression has become a standard tool for model building across a wide range of application domains. Common practice is to tune the amount of penalization to tradeoff bias and variance or to optimize some other measure of performance of the estimated model. An advantage of such automated model-building procedures is that their operating characteristics are well-defined, i.e., completely data-driven, and thereby they can be systematically studied. However, in many applications it is desirable to incorporate domain knowledge into the model building process; one way to do this is to characterize each model along the solution path of a penalized regression estimator in terms of an operating characteristic that is meaningful within a domain context and then to allow domain experts to choose from among these models using these operating characteristics as well as other factors not available to the estimation algorithm. We derive an estimator of the false selection rate for each model along the solution path using a novel variable addition method. The proposed estimator applies to both fixed and random designs and allows for $p \gg n$. The proposed estimator can be used to estimate a model with a pre-specified false selection rate or can be overlaid on the solution path to facilitate interactive model exploration. We characterize the asymptotic behavior of the proposed estimator in the case of a linear model under a fixed design; however, simulation experiments show that the proposed estimator provides consistently more accurate estimates of the false selection rate than competing methods across a wide range of models.
△ Less
Submitted 3 April, 2018;
originally announced April 2018.
-
Estimation and Optimization of Composite Outcomes
Authors:
Daniel J. Luckett,
Eric B. Laber,
Michael R. Kosorok
Abstract:
There is tremendous interest in precision medicine as a means to improve patient outcomes by tailoring treatment to individual characteristics. An individualized treatment rule formalizes precision medicine as a map from patient information to a recommended treatment. A treatment rule is defined to be optimal if it maximizes the mean of a scalar outcome in a population of interest, e.g., symptom r…
▽ More
There is tremendous interest in precision medicine as a means to improve patient outcomes by tailoring treatment to individual characteristics. An individualized treatment rule formalizes precision medicine as a map from patient information to a recommended treatment. A treatment rule is defined to be optimal if it maximizes the mean of a scalar outcome in a population of interest, e.g., symptom reduction. However, clinical and intervention scientists often must balance multiple and possibly competing outcomes, e.g., symptom reduction and the risk of an adverse event. One approach to precision medicine in this setting is to elicit a composite outcome which balances all competing outcomes; unfortunately, eliciting a composite outcome directly from patients is difficult without a high-quality instrument, and an expert-derived composite outcome may not account for heterogeneity in patient preferences. We propose a new paradigm for the study of precision medicine using observational data that relies solely on the assumption that clinicians are approximately (i.e., imperfectly) making decisions to maximize individual patient utility. Estimated composite outcomes are subsequently used to construct an estimator of an individualized treatment rule which maximizes the mean of patient-specific composite outcomes. The estimated composite outcomes and estimated optimal individualized treatment rule provide new insights into patient preference heterogeneity, clinician behavior, and the value of precision medicine in a given domain. We derive inference procedures for the proposed estimators under mild conditions and demonstrate their finite sample performance through a suite of simulation experiments and an illustrative application to data from a study of bipolar depression.
△ Less
Submitted 26 May, 2020; v1 submitted 28 November, 2017;
originally announced November 2017.
-
Sufficient Markov Decision Processes with Alternating Deep Neural Networks
Authors:
Longshaokan Wang,
Eric B. Laber,
Katie Witkiewitz
Abstract:
Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We pro…
▽ More
Advances in mobile computing technologies have made it possible to monitor and apply data-driven interventions across complex systems in real time. Markov decision processes (MDPs) are the primary model for sequential decision problems with a large or indefinite time horizon. Choosing a representation of the underlying decision process that is both Markov and low-dimensional is non-trivial. We propose a method for constructing a low-dimensional representation of the original decision process for which: 1. the MDP model holds; 2. a decision strategy that maximizes mean utility when applied to the low-dimensional representation also maximizes mean utility when applied to the original process. We use a deep neural network to define a class of potential process representations and estimate the process of lowest dimension within this class. The method is illustrated using data from a mobile study on heavy drinking and smoking among college students.
△ Less
Submitted 17 March, 2018; v1 submitted 25 April, 2017;
originally announced April 2017.
-
Estimating Dynamic Treatment Regimes in Mobile Health Using V-learning
Authors:
Daniel J. Luckett,
Eric B. Laber,
Anna R. Kahkoska,
David M. Maahs,
Elizabeth Mayer-Davis,
Michael R. Kosorok
Abstract:
The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best healthcare possible for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are…
▽ More
The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best healthcare possible for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are needed. Dynamic treatment regimes formalize individualized treatment plans as sequences of decision rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. However, existing methods for estimating optimal dynamic treatment regimes are designed for a small number of fixed decision points occurring on a coarse time-scale. We propose a new reinforcement learning method for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an outpatient setting. The proposed method accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications. We show the proposed estimators are consistent and asymptotically normal under mild conditions. The proposed methods are applied to estimate an optimal dynamic treatment regime for controlling blood glucose levels in patients with type 1 diabetes.
△ Less
Submitted 14 October, 2017; v1 submitted 10 November, 2016;
originally announced November 2016.
-
A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward
Authors:
S. A. Murphy,
Y. Deng,
E. B. Laber,
H. R. Maei,
R. S. Sutton,
K. Witkiewitz
Abstract:
We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.
We develop an off-policy actor-critic algorithm for learning an optimal policy from a training set composed of data from multiple individuals. This algorithm is developed with a view towards its use in mobile health.
△ Less
Submitted 18 July, 2016;
originally announced July 2016.
-
Interpretable Dynamic Treatment Regimes
Authors:
Yichi Zhang,
Eric B. Laber,
Anastasios Tsiatis,
Marie Davidian
Abstract:
Precision medicine is currently a topic of great interest in clinical and intervention science. One way to formalize precision medicine is through a treatment regime, which is a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended treatment. An optimal treatment regime is defined as maximizing the mean of some cumulative clini…
▽ More
Precision medicine is currently a topic of great interest in clinical and intervention science. One way to formalize precision medicine is through a treatment regime, which is a sequence of decision rules, one per stage of clinical intervention, that map up-to-date patient information to a recommended treatment. An optimal treatment regime is defined as maximizing the mean of some cumulative clinical outcome if applied to a population of interest. It is well-known that even under simple generative models an optimal treatment regime can be a highly nonlinear function of patient information. Consequently, a focal point of recent methodological research has been the development of flexible models for estimating optimal treatment regimes. However, in many settings, estimation of an optimal treatment regime is an exploratory analysis intended to generate new hypotheses for subsequent research and not to directly dictate treatment to new patients. In such settings, an estimated treatment regime that is interpretable in a domain context may be of greater value than an unintelligible treatment regime built using "black-box" estimation methods. We propose an estimator of an optimal treatment regime composed of a sequence of decision rules, each expressible as a list of "if-then" statements that can be presented as either a paragraph or as a simple flowchart that is immediately interpretable to domain experts. The discreteness of these lists precludes smooth, i.e., gradient-based, methods of estimation and leads to non-standard asymptotics. Nevertheless, we provide a computationally efficient estimation algorithm, prove consistency of the proposed estimator, and derive rates of convergence. We illustrate the proposed methods using a series of simulation examples and application to data from a sequential clinical trial on bipolar disorder.
△ Less
Submitted 5 June, 2016;
originally announced June 2016.
-
Using Decision Lists to Construct Interpretable and Parsimonious Treatment Regimes
Authors:
Yichi Zhang,
Eric B. Laber,
Anastasios Tsiatis,
Marie Davidian
Abstract:
A treatment regime formalizes personalized medicine as a function from individual patient characteristics to a recommended treatment. A high-quality treatment regime can improve patient outcomes while reducing cost, resource consumption, and treatment burden. Thus, there is tremendous interest in estimating treatment regimes from observational and randomized studies. However, the development of tr…
▽ More
A treatment regime formalizes personalized medicine as a function from individual patient characteristics to a recommended treatment. A high-quality treatment regime can improve patient outcomes while reducing cost, resource consumption, and treatment burden. Thus, there is tremendous interest in estimating treatment regimes from observational and randomized studies. However, the development of treatment regimes for application in clinical practice requires the long-term, joint effort of statisticians and clinical scientists. In this collaborative process, the statistician must integrate clinical science into the statistical models underlying a treatment regime and the clinician must scrutinize the estimated treatment regime for scientific validity. To facilitate meaningful information exchange, it is important that estimated treatment regimes be interpretable in a subject-matter context. We propose a simple, yet flexible class of treatment regimes whose members are representable as a short list of if-then statements. Regimes in this class are immediately interpretable and are therefore an appealing choice for broad application in practice. We derive a robust estimator of the optimal regime within this class and demonstrate its finite sample performance using simulation experiments. The proposed method is illustrated with data from two clinical trials.
△ Less
Submitted 28 April, 2015;
originally announced April 2015.
-
Interactive Q-learning for Probabilities and Quantiles
Authors:
Kristin A. Linn,
Eric B. Laber,
Leonard A. Stefanski
Abstract:
A dynamic treatment regime is a sequence of decision rules in which each decision rule recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive…
▽ More
A dynamic treatment regime is a sequence of decision rules in which each decision rule recommends treatment based on features of patient medical history such as past treatments and outcomes. Existing methods for estimating optimal dynamic treatment regimes from data optimize the mean of a response variable. However, the mean may not always be the most appropriate summary of performance. We derive estimators of decision rules for optimizing probabilities and quantiles computed with respect to the response distribution for two-stage, binary treatment settings. This enables estimation of dynamic treatment regimes that optimize the cumulative distribution function of the response at a prespecified point or a prespecified quantile of the response distribution such as the median. The proposed methods perform favorably in simulation experiments. We illustrate our approach with data from a sequentially randomized trial where the primary outcome is remission of depression symptoms.
△ Less
Submitted 21 May, 2015; v1 submitted 12 July, 2014;
originally announced July 2014.
-
Set-valued dynamic treatment regimes for competing outcomes
Authors:
Eric B. Laber,
Daniel J. Lizotte,
Bradley Ferguson
Abstract:
Dynamic treatment regimes operationalize the clinical decision process as a sequence of functions, one for each clinical decision, where each function takes as input up-to-date patient information and gives as output a single recommended treatment. Current methods for estimating optimal dynamic treatment regimes, for example Q-learning, require the specification of a single outcome by which the `g…
▽ More
Dynamic treatment regimes operationalize the clinical decision process as a sequence of functions, one for each clinical decision, where each function takes as input up-to-date patient information and gives as output a single recommended treatment. Current methods for estimating optimal dynamic treatment regimes, for example Q-learning, require the specification of a single outcome by which the `goodness' of competing dynamic treatment regimes are measured. However, this is an over-simplification of the goal of clinical decision making, which aims to balance several potentially competing outcomes. For example, often a balance must be struck between treatment effectiveness and side-effect burden. We propose a method for constructing dynamic treatment regimes that accommodates competing outcomes by recommending sets of treatments at each decision point. Formally, we construct a sequence of set-valued functions that take as input up-to-date patient information and give as output a recommended subset of the possible treatments. For a given patient history, the recommended set of treatments contains all treatments that are not inferior according to any of the competing outcomes. When there is more than one decision point, constructing these set-valued functions requires solving a non-trivial enumeration problem. We offer an exact enumeration algorithm by recasting the problem as a linear mixed integer program. The proposed methods are illustrated using data from a depression study and the CATIE schizophrenia study.
△ Less
Submitted 7 August, 2012; v1 submitted 12 July, 2012;
originally announced July 2012.
-
Small Sample Inference for Generalization Error in Classification Using the CUD Bound
Authors:
Eric B. Laber,
Susan A. Murphy
Abstract:
Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the genera…
▽ More
Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a non-normal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrapping this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets.
△ Less
Submitted 13 June, 2012;
originally announced June 2012.
-
An imputation method for estimating the learning curve in classification problems
Authors:
Eric B. Laber,
Kerby Shedden,
Yang Yang
Abstract:
The learning curve expresses the error rate of a predictive modeling procedure as a function of the sample size of the training dataset. It typically is a decreasing, convex function with a positive limiting value. An estimate of the learning curve can be used to assess whether a modeling procedure should be expected to become substantially more accurate if additional training data become availabl…
▽ More
The learning curve expresses the error rate of a predictive modeling procedure as a function of the sample size of the training dataset. It typically is a decreasing, convex function with a positive limiting value. An estimate of the learning curve can be used to assess whether a modeling procedure should be expected to become substantially more accurate if additional training data become available. This article proposes a new procedure for estimating learning curves using imputation. We focus on classification, although the idea is applicable to other predictive modeling settings. Simulation studies indicate that the learning curve can be estimated with useful accuracy for a roughly four-fold increase in the size of the training set relative to the available data, and that the proposed imputation approach outperforms an alternative estimation approach based on parameterizing the learning curve. We illustrate the method with an application that predicts the risk of disease progression for people with chronic lymphocytic leukemia.
△ Less
Submitted 13 March, 2012;
originally announced March 2012.
-
$Q$- and $A$-Learning Methods for Estimating Optimal Dynamic Treatment Regimes
Authors:
Phillip J. Schulte,
Anastasios A. Tsiatis,
Eric B. Laber,
Marie Davidian
Abstract:
In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data,…
▽ More
In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would yield the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.
△ Less
Submitted 3 February, 2015; v1 submitted 19 February, 2012;
originally announced February 2012.
-
Statistical Inference in Dynamic Treatment Regimes
Authors:
Eric B. Laber,
Min Qian,
Dan J. Lizotte,
William E. Pelham,
Susan A. Murphy
Abstract:
Dynamic treatment regimes are of growing interest across the clinical sciences as these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. A dynamic treatment regime is a sequence of decision rules, with a decision rule per stage of clinical intervention; each decision rule maps up-to-date patient information to a recommended treatment. We b…
▽ More
Dynamic treatment regimes are of growing interest across the clinical sciences as these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. A dynamic treatment regime is a sequence of decision rules, with a decision rule per stage of clinical intervention; each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review an interesting challenge, that of nonregularity that often arises in this area. By nonregularity, we mean the parameters indexing the optimal dynamic treatment regime are nonsmooth functionals of the underlying generative distribution.
A consequence is that no regular or asymptotically unbiased estimator of these parameters exists. Nonregularity arises in inference for parameters in the optimal dynamic treatment regime; we illustrate the effect of nonregularity on asymptotic bias and via sensitivity of asymptotic, limiting, distributions to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Interventions for Children with ADHD study as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area.
△ Less
Submitted 26 November, 2013; v1 submitted 30 June, 2010;
originally announced June 2010.