Search | arXiv e-print repository

"It felt more real": Investigating the User Experience of the MiWaves Personalizing JITAI Pilot Study

Authors: Susobhan Ghosh, Pei-Yao Hung, Lara N. Coughlin, Erin E. Bonar, Yongyi Guo, Inbal Nahum-Shani, Maureen Walton, Mark W. Newman, Susan A. Murphy

Abstract: Cannabis use among emerging adults is increasing globally, posing significant health risks and creating a need for effective interventions. We present an exploratory analysis of the MiWaves pilot study, a digital intervention aimed at supporting cannabis use reduction among emerging adults (ages 18-25). Our findings indicate the potential of self-monitoring check-ins and trend visualizations in fo… ▽ More Cannabis use among emerging adults is increasing globally, posing significant health risks and creating a need for effective interventions. We present an exploratory analysis of the MiWaves pilot study, a digital intervention aimed at supporting cannabis use reduction among emerging adults (ages 18-25). Our findings indicate the potential of self-monitoring check-ins and trend visualizations in fostering self-awareness and promoting behavioral reflection in participants. MiWaves intervention message timing and frequency were also generally well-received by the participants. The participants' perception of effort were queried on intervention messages with different tasks, and our findings suggest that messages with tasks like exploring links and typing in responses are perceived as requiring more effort as compared to messages with tasks involving reading and acknowledging. Finally, we discuss the findings and limitations from this study and analysis, and their impact on informing future iterations on MiWaves. △ Less

Submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.06835 [pdf, other]

Reinforcement Learning on Dyads to Enhance Medication Adherence

Authors: Ziping Xu, Hinal Jajal, Sung Won Choi, Inbal Nahum-Shani, Guy Shani, Alexandra M. Psihogios, Pei-Yao Hung, Susan Murphy

Abstract: Medication adherence is critical for the recovery of adolescents and young adults (AYAs) who have undergone hematopoietic cell transplantation (HCT). However, maintaining adherence is challenging for AYAs after hospital discharge, who experience both individual (e.g. physical and emotional symptoms) and interpersonal barriers (e.g., relational difficulties with their care partner, who is often inv… ▽ More Medication adherence is critical for the recovery of adolescents and young adults (AYAs) who have undergone hematopoietic cell transplantation (HCT). However, maintaining adherence is challenging for AYAs after hospital discharge, who experience both individual (e.g. physical and emotional symptoms) and interpersonal barriers (e.g., relational difficulties with their care partner, who is often involved in medication management). To optimize the effectiveness of a three-component digital intervention targeting both members of the dyad as well as their relationship, we propose a novel Multi-Agent Reinforcement Learning (MARL) approach to personalize the delivery of interventions. By incorporating the domain knowledge, the MARL framework, where each agent is responsible for the delivery of one intervention component, allows for faster learning compared with a flattened agent. Evaluation using a dyadic simulator environment, based on real clinical data, shows a significant improvement in medication adherence (approximately 3%) compared to purely random intervention delivery. The effectiveness of this approach will be further evaluated in an upcoming trial. △ Less

Submitted 21 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2502.01789 [pdf]

An Agentic AI Workflow for Detecting Cognitive Concerns in Real-world Data

Authors: Jiazi Tian, Liqin Wang, Pedram Fard, Valdery Moura Junior, Deborah Blacker, Jennifer S. Haas, Chirag Patel, Shawn N. Murphy, Lidia M. V. R. Moura, Hossein Estiri

Abstract: Early identification of cognitive concerns is critical but often hindered by subtle symptom presentation. This study developed and validated a fully automated, multi-agent AI workflow using LLaMA 3 8B to identify cognitive concerns in 3,338 clinical notes from Mass General Brigham. The agentic workflow, leveraging task-specific agents that dynamically collaborate to extract meaningful insights fro… ▽ More Early identification of cognitive concerns is critical but often hindered by subtle symptom presentation. This study developed and validated a fully automated, multi-agent AI workflow using LLaMA 3 8B to identify cognitive concerns in 3,338 clinical notes from Mass General Brigham. The agentic workflow, leveraging task-specific agents that dynamically collaborate to extract meaningful insights from clinical notes, was compared to an expert-driven benchmark. Both workflows achieved high classification performance, with F1-scores of 0.90 and 0.91, respectively. The agentic workflow demonstrated improved specificity (1.00) and achieved prompt refinement in fewer iterations. Although both workflows showed reduced performance on validation data, the agentic workflow maintained perfect specificity. These findings highlight the potential of fully automated multi-agent AI workflows to achieve expert-level accuracy with greater efficiency, offering a scalable and cost-effective solution for detecting cognitive concerns in clinical settings. △ Less

Submitted 3 February, 2025; originally announced February 2025.

arXiv:2412.18799 [pdf]

Quantifying the Risk of Pastoral Conflict in 4 Central African Countries

Authors: Lirika Solaa, Youdinghuan Chen, Samantha K. Murphy, V. S. Subrahmanian

Abstract: Climate change is becoming a widely recognized risk factor of farmer-herder conflict in Africa. Using an 8 year dataset (Jan 2015 to Sep 2022) of detailed weather and terrain data across four African nations, we apply statistical and machine learning methods to analyze pastoral conflict. We test hypotheses linking these variables with pastoral conflict within each country using geospatial and stat… ▽ More Climate change is becoming a widely recognized risk factor of farmer-herder conflict in Africa. Using an 8 year dataset (Jan 2015 to Sep 2022) of detailed weather and terrain data across four African nations, we apply statistical and machine learning methods to analyze pastoral conflict. We test hypotheses linking these variables with pastoral conflict within each country using geospatial and statistical analysis. Complementing this analysis are risk maps automatically updated for decision-makers. Our models estimate which cells have a high likelihood of experiencing pastoral conflict with high predictive accuracy and study the variation of this accuracy with the granularity of the cells. △ Less

Submitted 25 December, 2024; originally announced December 2024.

Comments: v1. 78 pages, 13 figures, 23 tables

arXiv:2412.00308 [pdf, other]

BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings

Authors: Karine Karine, Susan A. Murphy, Benjamin M. Marlin

Abstract: In settings where the application of reinforcement learning (RL) requires running real-world trials, including the optimization of adaptive health interventions, the number of episodes available for learning can be severely limited due to cost or time constraints. In this setting, the bias-variance trade-off of contextual bandit methods can be significantly better than that of more complex full RL… ▽ More In settings where the application of reinforcement learning (RL) requires running real-world trials, including the optimization of adaptive health interventions, the number of episodes available for learning can be severely limited due to cost or time constraints. In this setting, the bias-variance trade-off of contextual bandit methods can be significantly better than that of more complex full RL methods. However, Thompson sampling bandits are limited to selecting actions based on distributions of immediate rewards. In this paper, we extend the linear Thompson sampling bandit to select actions based on a state-action utility function consisting of the Thompson sampler's estimate of the expected immediate reward combined with an action bias term. We use batch Bayesian optimization over episodes to learn the action bias terms with the goal of maximizing the expected return of the extended Thompson sampler. The proposed approach is able to learn optimal policies for a strictly broader class of Markov decision processes (MDPs) than standard Thompson sampling. Using an adaptive intervention simulation environment that captures key aspects of behavioral dynamics, we show that the proposed method can significantly out-perform standard Thompson sampling in terms of total return, while requiring significantly fewer episodes than standard value function and policy gradient methods. △ Less

Submitted 29 November, 2024; originally announced December 2024.

Comments: Accepted at NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty

arXiv:2410.24131 [pdf, ps, other]

Transit drivers' reflections on the benefits and harms of eye tracking technology

Authors: Shaina Murphy, Bryce Grame, Ethan Smith, Siva Srinivasan, Eakta Jain

Abstract: Eye tracking technology offers great potential for improving road safety. It is already being built into vehicles, namely cars and trucks. When this technology is integrated into transit service vehicles, employees, i.e., bus drivers, will be subject to being eye tracked on their job. Although there is much research effort advancing algorithms for eye tracking in transportation, less is known abou… ▽ More Eye tracking technology offers great potential for improving road safety. It is already being built into vehicles, namely cars and trucks. When this technology is integrated into transit service vehicles, employees, i.e., bus drivers, will be subject to being eye tracked on their job. Although there is much research effort advancing algorithms for eye tracking in transportation, less is known about how end users perceive this technology, especially when interacting with it in an employer-mandated context. In this first study of its kind, we investigated transit bus operators' perceptions of eye tracking technology. From a methodological perspective, we introduce a mixed methods approach where participants experience the technology first-hand and then reflect on their experience while viewing a playback of the recorded data. Thematic analysis of the interview transcripts reveals interesting potential uses of eye tracking in this work context and surfaces transit operators' fears and concerns about this technology. △ Less

Submitted 6 December, 2024; v1 submitted 31 October, 2024; originally announced October 2024.

arXiv:2410.14659 [pdf, other]

Harnessing Causality in Reinforcement Learning With Bagged Decision Times

Authors: Daiqi Gao, Hsin-Yu Lai, Predrag Klasnja, Susan A. Murphy

Abstract: We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. All actions within a bag jointly impact a single reward, observed at the end of the bag. For example, in mobile health, multiple activity suggestions in a day collective… ▽ More We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. All actions within a bag jointly impact a single reward, observed at the end of the bag. For example, in mobile health, multiple activity suggestions in a day collectively affect a user's daily commitment to being active. Our goal is to develop an online RL algorithm to maximize the discounted sum of the bag-specific rewards. To handle non-Markovian transitions within a bag, we utilize an expert-provided causal directed acyclic graph (DAG). Based on the DAG, we construct states as a dynamical Bayesian sufficient statistic of the observed history, which results in Markov state transitions within and across bags. We then formulate this problem as a periodic Markov decision process (MDP) that allows non-stationarity within a period. An online RL algorithm based on Bellman equations for stationary MDPs is generalized to handle periodic MDPs. We show that our constructed state achieves the maximal optimal value function among all state constructions for a periodic MDP. Finally, we evaluate the proposed method on testbed variants built from real data in a mobile health clinical trial. △ Less

Submitted 6 May, 2025; v1 submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.12793 [pdf, other]

doi 10.1038/s44401-024-00009-w

Environment Scan of Generative AI Infrastructure for Clinical and Translational Science

Authors: Betina Idnay, Zihan Xu, William G. Adams, Mohammad Adibuzzaman, Nicholas R. Anderson, Neil Bahroos, Douglas S. Bell, Cody Bumgardner, Thomas Campion, Mario Castro, James J. Cimino, I. Glenn Cohen, David Dorr, Peter L Elkin, Jungwei W. Fan, Todd Ferris, David J. Foran, David Hanauer, Mike Hogarth, Kun Huang, Jayashree Kalpathy-Cramer, Manoj Kandpal, Niranjan S. Karnik, Avnish Katoch, Albert M. Lai , et al. (32 additional authors not shown)

Abstract: This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With t… ▽ More This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With the rapid advancement of GenAI technologies, including large language models (LLMs), healthcare institutions face unprecedented opportunities and challenges. This research explores the current status of GenAI integration, focusing on stakeholder roles, governance structures, and ethical considerations by administering a survey among leaders of health institutions (i.e., representing academic medical centers and health systems) to assess the institutional readiness and approach towards GenAI adoption. Key findings indicate a diverse range of institutional strategies, with most organizations in the experimental phase of GenAI deployment. The study highlights significant variations in governance models, with a strong preference for centralized decision-making but notable gaps in workforce training and ethical oversight. Moreover, the results underscore the need for a more coordinated approach to GenAI governance, emphasizing collaboration among senior leaders, clinicians, information technology staff, and researchers. Our analysis also reveals concerns regarding GenAI bias, data security, and stakeholder trust, which must be addressed to ensure the ethical and effective implementation of GenAI technologies. This study offers valuable insights into the challenges and opportunities of GenAI integration in healthcare, providing a roadmap for institutions aiming to leverage GenAI for improved quality of care and operational efficiency. △ Less

Submitted 27 September, 2024; originally announced October 2024.

arXiv:2410.03380 [pdf, ps, other]

Identifying biological perturbation targets through causal differential networks

Authors: Menghua Wu, Umesh Padia, Sean H. Murphy, Regina Barzilay, Tommi Jaakkola

Abstract: Identifying variables responsible for changes to a biological system enables applications in drug target discovery and cell engineering. Given a pair of observational and interventional datasets, the goal is to isolate the subset of observed variables that were the targets of the intervention. Directly applying causal discovery algorithms is challenging: the data may contain thousands of variables… ▽ More Identifying variables responsible for changes to a biological system enables applications in drug target discovery and cell engineering. Given a pair of observational and interventional datasets, the goal is to isolate the subset of observed variables that were the targets of the intervention. Directly applying causal discovery algorithms is challenging: the data may contain thousands of variables with as few as tens of samples per intervention, and biological systems do not adhere to classical causality assumptions. We propose a causality-inspired approach to address this practical setting. First, we infer noisy causal graphs from the observational and interventional data. Then, we learn to map the differences between these graphs, along with additional statistical features, to sets of variables that were intervened upon. Both modules are jointly trained in a supervised framework, on simulated and real data that reflect the nature of biological interventions. This approach consistently outperforms baselines for perturbation modeling on seven single-cell transcriptomics datasets. We also demonstrate significant improvements over current causal discovery methods for predicting soft and hard intervention targets across a variety of synthetic data. △ Less

Submitted 30 May, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

Journal ref: Proceedings of the 42nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025

arXiv:2409.10526 [pdf, other]

Effective Monitoring of Online Decision-Making Algorithms in Digital Intervention Implementation

Authors: Anna L. Trella, Susobhan Ghosh, Erin E. Bonar, Lara Coughlin, Finale Doshi-Velez, Yongyi Guo, Pei-Yao Hung, Inbal Nahum-Shani, Vivek Shetty, Maureen Walton, Iris Yan, Kelly W. Zhang, Susan A. Murphy

Abstract: Online AI decision-making algorithms are increasingly used by digital interventions to dynamically personalize treatment to individuals. These algorithms determine, in real-time, the delivery of treatment based on accruing data. The objective of this paper is to provide guidelines for enabling effective monitoring of online decision-making algorithms with the goal of (1) safeguarding individuals a… ▽ More Online AI decision-making algorithms are increasingly used by digital interventions to dynamically personalize treatment to individuals. These algorithms determine, in real-time, the delivery of treatment based on accruing data. The objective of this paper is to provide guidelines for enabling effective monitoring of online decision-making algorithms with the goal of (1) safeguarding individuals and (2) ensuring data quality. We elucidate guidelines and discuss our experience in monitoring online decision-making algorithms in two digital intervention clinical trials (Oralytics and MiWaves). Our guidelines include (1) developing fallback methods, pre-specified procedures executed when an issue occurs, and (2) identifying potential issues categorizing them by severity (red, yellow, and green). Across both trials, the monitoring systems detected real-time issues such as out-of-memory issues, database timeout, and failed communication with an external source. Fallback methods prevented participants from not receiving any treatment during the trial and also prevented the use of incorrect data in statistical analyses. These trials provide case studies for how health scientists can build monitoring systems for their digital intervention. Without these algorithm monitoring systems, critical issues would have gone undetected and unresolved. Instead, these monitoring systems safeguarded participants and ensured the quality of the resulting data for updating the intervention and facilitating scientific discovery. These monitoring guidelines and findings give digital intervention teams the confidence to include online decision-making algorithms in digital interventions. △ Less

Submitted 30 August, 2024; originally announced September 2024.

arXiv:2409.02069 [pdf, other]

A Deployed Online Reinforcement Learning Algorithm In An Oral Health Clinical Trial

Authors: Anna L. Trella, Kelly W. Zhang, Hinal Jajal, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy

Abstract: Dental disease is a prevalent chronic condition associated with substantial financial burden, personal suffering, and increased risk of systemic diseases. Despite widespread recommendations for twice-daily tooth brushing, adherence to recommended oral self-care behaviors remains sub-optimal due to factors such as forgetfulness and disengagement. To address this, we developed Oralytics, a mHealth i… ▽ More Dental disease is a prevalent chronic condition associated with substantial financial burden, personal suffering, and increased risk of systemic diseases. Despite widespread recommendations for twice-daily tooth brushing, adherence to recommended oral self-care behaviors remains sub-optimal due to factors such as forgetfulness and disengagement. To address this, we developed Oralytics, a mHealth intervention system designed to complement clinician-delivered preventative care for marginalized individuals at risk for dental disease. Oralytics incorporates an online reinforcement learning algorithm to determine optimal times to deliver intervention prompts that encourage oral self-care behaviors. We have deployed Oralytics in a registered clinical trial. The deployment required careful design to manage challenges specific to the clinical trials setting in the U.S. In this paper, we (1) highlight key design decisions of the RL algorithm that address these challenges and (2) conduct a re-sampling analysis to evaluate algorithm design decisions. A second phase (randomized control trial) of Oralytics is planned to start in spring 2025. △ Less

Submitted 18 December, 2024; v1 submitted 3 September, 2024; originally announced September 2024.

arXiv:2408.15076 [pdf, other]

MiWaves Reinforcement Learning Algorithm

Authors: Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

Abstract: The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforc… ▽ More The escalating prevalence of cannabis use poses a significant public health challenge globally. In the U.S., cannabis use is more prevalent among emerging adults (EAs) (ages 18-25) than any other age group, with legalization in the multiple states contributing to a public perception that cannabis is less risky than in prior decades. To address this growing concern, we developed MiWaves, a reinforcement learning (RL) algorithm designed to optimize the delivery of personalized intervention prompts to reduce cannabis use among EAs. MiWaves leverages domain expertise and prior data to tailor the likelihood of delivery of intervention messages. This paper presents a comprehensive overview of the algorithm's design, including key decisions and experimental outcomes. The finalized MiWaves RL algorithm was deployed in a clinical trial from March to May 2024. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.17739

arXiv:2406.19662 [pdf, other]

Finite basis Kolmogorov-Arnold networks: domain decomposition for data-driven and physics-informed problems

Authors: Amanda A. Howard, Bruno Jacob, Sarah H. Murphy, Alexander Heinlein, Panos Stinis

Abstract: Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to… ▽ More Kolmogorov-Arnold networks (KANs) have attracted attention recently as an alternative to multilayer perceptrons (MLPs) for scientific machine learning. However, KANs can be expensive to train, even for relatively small networks. Inspired by finite basis physics-informed neural networks (FBPINNs), in this work, we develop a domain decomposition method for KANs that allows for several small KANs to be trained in parallel to give accurate solutions for multiscale problems. We show that finite basis KANs (FBKANs) can provide accurate results with noisy data and for physics-informed training. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.13127 [pdf, other]

Oralytics Reinforcement Learning Algorithm

Authors: Anna L. Trella, Kelly W. Zhang, Stephanie M. Carpenter, David Elashoff, Zara M. Greer, Inbal Nahum-Shani, Dennis Ruenger, Vivek Shetty, Susan A. Murphy

Abstract: Dental disease is still one of the most common chronic diseases in the United States. While dental disease is preventable through healthy oral self-care behaviors (OSCB), this basic behavior is not consistently practiced. We have developed Oralytics, an online, reinforcement learning (RL) algorithm that optimizes the delivery of personalized intervention prompts to improve OSCB. In this paper, we… ▽ More Dental disease is still one of the most common chronic diseases in the United States. While dental disease is preventable through healthy oral self-care behaviors (OSCB), this basic behavior is not consistently practiced. We have developed Oralytics, an online, reinforcement learning (RL) algorithm that optimizes the delivery of personalized intervention prompts to improve OSCB. In this paper, we offer a full overview of algorithm design decisions made using prior data, domain expertise, and experiments in a simulation test bed. The finalized RL algorithm was deployed in the Oralytics clinical trial, conducted from fall 2023 to summer 2024. △ Less

Submitted 12 September, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.06306 [pdf, other]

Unified Fourier bases for signals on random graphs with group symmetries

Authors: Mahya Ghandehari, Jeannette Janssen, Silo Murphy

Abstract: We consider a recently proposed approach to graph signal processing (GSP) based on graphons. We show how the graphon-based approach to GSP applies to graphs sampled from a stochastic block model derived from a weighted Cayley graph. When SBM block sizes are equal, a nice Fourier basis can be derived from the representation theory of the underlying group. We explore how the SBM Fourier basis is aff… ▽ More We consider a recently proposed approach to graph signal processing (GSP) based on graphons. We show how the graphon-based approach to GSP applies to graphs sampled from a stochastic block model derived from a weighted Cayley graph. When SBM block sizes are equal, a nice Fourier basis can be derived from the representation theory of the underlying group. We explore how the SBM Fourier basis is affected when block sizes are not uniform. When block sizes are nearly uniform, we demonstrate that the group Fourier basis closely approximates the SBM Fourier basis. More specifically, we quantify the approximation error using matrix perturbation theory. When block sizes are highly non-uniform, the group-based Fourier basis can no longer be used. However, we show that partial information regarding the SBM Fourier basis can still be obtained from the underlying group. △ Less

Submitted 14 March, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 25 pages

MSC Class: 94A12

arXiv:2405.19660 [pdf, other]

PATIENT-Ψ: Using Large Language Models to Simulate Patients for Training Mental Health Professionals

Authors: Ruiyi Wang, Stephanie Milani, Jamie C. Chiu, Jiayin Zhi, Shaun M. Eack, Travis Labrum, Samuel M. Murphy, Nev Jones, Kate Hardy, Hong Shen, Fei Fang, Zhiyu Zoey Chen

Abstract: Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-Ψ, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-Ψ, we construct diverse patient cogniti… ▽ More Mental illness remains one of the most critical public health issues. Despite its importance, many mental health professionals highlight a disconnect between their training and actual real-world patient practice. To help bridge this gap, we propose PATIENT-Ψ, a novel patient simulation framework for cognitive behavior therapy (CBT) training. To build PATIENT-Ψ, we construct diverse patient cognitive models based on CBT principles and use large language models (LLMs) programmed with these cognitive models to act as a simulated therapy patient. We propose an interactive training scheme, PATIENT-Ψ-TRAINER, for mental health trainees to practice a key skill in CBT -- formulating the cognitive model of the patient -- through role-playing a therapy session with PATIENT-Ψ. To evaluate PATIENT-Ψ, we conducted a comprehensive user study of 13 mental health trainees and 20 experts. The results demonstrate that practice using PATIENT-Ψ-TRAINER enhances the perceived skill acquisition and confidence of the trainees beyond existing forms of training such as textbooks, videos, and role-play with non-patients. Based on the experts' perceptions, PATIENT-Ψ is perceived to be closer to real patient interactions than GPT-4, and PATIENT-Ψ-TRAINER holds strong promise to improve trainee competencies. Our code and data are released at \url{https://github.com/ruiyiw/patient-psi}. △ Less

Submitted 3 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: EMNLP 2024 Main, 9 pages, 5 figures

arXiv:2403.10946 [pdf, other]

The Fallacy of Minimizing Cumulative Regret in the Sequential Task Setting

Authors: Ziping Xu, Kelly W. Zhang, Susan A. Murphy

Abstract: Online Reinforcement Learning (RL) is typically framed as the process of minimizing cumulative regret (CR) through interactions with an unknown environment. However, real-world RL applications usually involve a sequence of tasks, and the data collected in the first task is used to warm-start the second task. The performance of the warm-start policy is measured by simple regret (SR). While minimizi… ▽ More Online Reinforcement Learning (RL) is typically framed as the process of minimizing cumulative regret (CR) through interactions with an unknown environment. However, real-world RL applications usually involve a sequence of tasks, and the data collected in the first task is used to warm-start the second task. The performance of the warm-start policy is measured by simple regret (SR). While minimizing both CR and SR is generally a conflicting objective, previous research has shown that in stationary environments, both can be optimized in terms of the duration of the task, $T$. In practice, however, in real-world applications, human-in-the-loop decisions between tasks often results in non-stationarity. For instance, in clinical trials, scientists may adjust target health outcomes between implementations. Our results show that task non-stationarity leads to a more restrictive trade-off between CR and SR. To balance these competing goals, the algorithm must explore excessively, leading to a CR bound worse than the typical optimal rate of $T^{1/2}$. These findings are practically significant, indicating that increased exploration is necessary in non-stationary environments to accommodate task changes, impacting the design of RL algorithms in fields such as healthcare and beyond. △ Less

Submitted 24 October, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.05911 [pdf, other]

Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning

Authors: Zana Buçinca, Siddharth Swaroop, Amanda E. Paluch, Susan A. Murphy, Krzysztof Z. Gajos

Abstract: Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning… ▽ More Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning (RL) as a general approach for modeling human-AI decision-making to optimize human-AI interaction for diverse objectives. RL can optimize such objectives by tailoring decision support, providing the right type of assistance to the right person at the right time. We instantiated our approach with two objectives: human-AI accuracy on the decision-making task and human learning about the task and learned decision support policies from previous human-AI interaction data. We compared the optimized policies against several baselines in AI-assisted decision-making. Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support. Our results further indicated that human learning was more difficult to optimize than accuracy, with participants who interacted with learning-optimized policies showing significant learning improvement only at times. Our research (1) demonstrates offline RL to be a promising approach to model human-AI decision-making, leading to policies that may optimize human-centric objectives and provide novel insights about the AI-assisted decision-making space, and (2) emphasizes the importance of considering human-centric objectives beyond decision accuracy in AI-assisted decision-making, opening up the novel research challenge of optimizing human-AI interaction for such objectives. △ Less

Submitted 14 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2402.17739 [pdf, other]

reBandit: Random Effects based Online RL algorithm for Reducing Cannabis Use

Authors: Susobhan Ghosh, Yongyi Guo, Pei-Yao Hung, Lara Coughlin, Erin Bonar, Inbal Nahum-Shani, Maureen Walton, Susan Murphy

Abstract: The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an onlin… ▽ More The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants. △ Less

Submitted 11 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.17003 [pdf, other]

Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

Authors: Anna L. Trella, Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Iris Yan, Finale Doshi-Velez, Susan A. Murphy

Abstract: Online reinforcement learning (RL) algorithms offer great potential for personalizing treatment for participants in clinical trials. However, deploying an online, autonomous algorithm in the high-stakes healthcare setting makes quality control and data quality especially difficult to achieve. This paper proposes algorithm fidelity as a critical requirement for deploying online RL algorithms in cli… ▽ More Online reinforcement learning (RL) algorithms offer great potential for personalizing treatment for participants in clinical trials. However, deploying an online, autonomous algorithm in the high-stakes healthcare setting makes quality control and data quality especially difficult to achieve. This paper proposes algorithm fidelity as a critical requirement for deploying online RL algorithms in clinical trials. It emphasizes the responsibility of the algorithm to (1) safeguard participants and (2) preserve the scientific utility of the data for post-trial analyses. We also present a framework for pre-deployment planning and real-time monitoring to help algorithm developers and clinical researchers ensure algorithm fidelity. To illustrate our framework's practical application, we present real-world examples from the Oralytics clinical trial. Since Spring 2023, this trial successfully deployed an autonomous, online RL algorithm to personalize behavioral interventions for participants at risk for dental disease. △ Less

Submitted 12 August, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.03110 [pdf, other]

Non-Stationary Latent Auto-Regressive Bandits

Authors: Anna L. Trella, Walter Dempsey, Asim H. Gazi, Ziping Xu, Finale Doshi-Velez, Susan A. Murphy

Abstract: For the non-stationary multi-armed bandit (MAB) problem, many existing methods allow a general mechanism for the non-stationarity, but rely on a budget for the non-stationarity that is sub-linear to the total number of time steps $T$. In many real-world settings, however, the mechanism for the non-stationarity can be modeled, but there is no budget for the non-stationarity. We instead consider the… ▽ More For the non-stationary multi-armed bandit (MAB) problem, many existing methods allow a general mechanism for the non-stationarity, but rely on a budget for the non-stationarity that is sub-linear to the total number of time steps $T$. In many real-world settings, however, the mechanism for the non-stationarity can be modeled, but there is no budget for the non-stationarity. We instead consider the non-stationary bandit problem where the reward means change due to a latent, auto-regressive (AR) state. We develop Latent AR LinUCB (LARL), an online linear contextual bandit algorithm that does not rely on the non-stationary budget, but instead forms good predictions of reward means by implicitly predicting the latent state. The key idea is to reduce the problem to a linear dynamical system which can be solved as a linear contextual bandit. In fact, LARL approximates a steady-state Kalman filter and efficiently learns system parameters online. We provide an interpretable regret bound for LARL with respect to the level of non-stationarity in the environment. LARL achieves sub-linear regret in this setting if the noise variance of the latent state process is sufficiently small with respect to $T$. Empirically, LARL outperforms various baseline methods in this non-stationary bandit problem. △ Less

Submitted 27 February, 2025; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.01995 [pdf, other]

Online Uniform Sampling: Randomized Learning-Augmented Approximation Algorithms with Application to Digital Health

Authors: Xueqing Liu, Kyra Gan, Esmaeil Keyvanshokooh, Susan Murphy

Abstract: Motivated by applications in digital health, this work studies the novel problem of online uniform sampling (OUS), where the goal is to distribute a sampling budget uniformly across unknown decision times. In the OUS problem, the algorithm is given a budget $b$ and a time horizon $T$, and an adversary then chooses a value $τ^* \in [b,T]$, which is revealed to the algorithm online. At each decision… ▽ More Motivated by applications in digital health, this work studies the novel problem of online uniform sampling (OUS), where the goal is to distribute a sampling budget uniformly across unknown decision times. In the OUS problem, the algorithm is given a budget $b$ and a time horizon $T$, and an adversary then chooses a value $τ^* \in [b,T]$, which is revealed to the algorithm online. At each decision time $i \in [τ^*]$, the algorithm must determine a sampling probability that maximizes the budget spent throughout the horizon, respecting budget constraint $b$, while achieving as uniform a distribution as possible over $τ^*$. We present the first randomized algorithm designed for this problem and subsequently extend it to incorporate learning augmentation. We provide worst-case approximation guarantees for both algorithms, and illustrate the utility of the algorithms through both synthetic experiments and a real-world case study involving the HeartSteps mobile application. Our numerical results show strong empirical average performance of our proposed randomized algorithms against previously proposed heuristic solutions. △ Less

Submitted 19 October, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.14923 [pdf, other]

Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks

Authors: Eura Nofshin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

Abstract: Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us unders… ▽ More Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of MDP equivalence specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: In AAMAS 2024

arXiv:2311.06483 [pdf, other]

Stacked networks improve physics-informed training: applications to neural networks and deep operator networks

Authors: Amanda A Howard, Sarah H Murphy, Shady E Ahmed, Panos Stinis

Abstract: Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build… ▽ More Physics-informed neural networks and operator networks have shown promise for effectively solving equations modeling physical systems. However, these networks can be difficult or impossible to train accurately for some systems of equations. We present a novel multifidelity framework for stacking physics-informed neural networks and operator networks that facilitates training. We successively build a chain of networks, where the output at one step can act as a low-fidelity input for training the next step, gradually increasing the expressivity of the learned model. The equations imposed at each step of the iterative process can be the same or different (akin to simulated annealing). The iterative (stacking) nature of the proposed method allows us to progressively learn features of a solution that are hard to learn directly. Through benchmark problems including a nonlinear pendulum, the wave equation, and the viscous Burgers equation, we show how stacking can be used to improve the accuracy and reduce the required size of physics-informed neural networks and operator networks. △ Less

Submitted 20 November, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

arXiv:2309.05671 [pdf]

tSPM+; a high-performance algorithm for mining transitive sequential patterns from clinical data

Authors: Jonas Hügel, Ulrich Sax, Shawn N. Murphy, Hossein Estiri

Abstract: The increasing availability of large clinical datasets collected from patients can enable new avenues for computational characterization of complex diseases using different analytic algorithms. One of the promising new methods for extracting knowledge from large clinical datasets involves temporal pattern mining integrated with machine learning workflows. However, mining these temporal patterns is… ▽ More The increasing availability of large clinical datasets collected from patients can enable new avenues for computational characterization of complex diseases using different analytic algorithms. One of the promising new methods for extracting knowledge from large clinical datasets involves temporal pattern mining integrated with machine learning workflows. However, mining these temporal patterns is a computational intensive task and has memory repercussions. Current algorithms, such as the temporal sequence pattern mining (tSPM) algorithm, are already providing promising outcomes, but still leave room for optimization. In this paper, we present the tSPM+ algorithm, a high-performance implementation of the tSPM algorithm, which adds a new dimension by adding the duration to the temporal patterns. We show that the tSPM+ algorithm provides a speed up to factor 980 and a up to 48 fold improvement in memory consumption. Moreover, we present a docker container with an R-package, We also provide vignettes for an easy integration into already existing machine learning workflows and use the mined temporal sequences to identify Post COVID-19 patients and their symptoms according to the WHO definition. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: Supplementary data: https://doi.org/10.5281/zenodo.8329519

arXiv:2308.07843 [pdf, other]

Dyadic Reinforcement Learning

Authors: Shuangning Li, Lluis Salvat Niell, Sung Won Choi, Inbal Nahum-Shani, Guy Shani, Susan Murphy

Abstract: Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship betwee… ▽ More Mobile health aims to enhance health outcomes by delivering interventions to individuals as they go about their daily life. The involvement of care partners and social support networks often proves crucial in helping individuals managing burdensome medical conditions. This presents opportunities in mobile health to design interventions that target the dyadic relationship -- the relationship between a target person and their care partner -- with the aim of enhancing social support. In this paper, we develop dyadic RL, an online reinforcement learning algorithm designed to personalize intervention delivery based on contextual factors and past responses of a target person and their care partner. Here, multiple sets of interventions impact the dyad across multiple time intervals. The developed dyadic RL is Bayesian and hierarchical. We formally introduce the problem setup, develop dyadic RL and establish a regret bound. We demonstrate dyadic RL's empirical performance through simulation studies on both toy scenarios and on a realistic test bed constructed from data collected in a mobile health study. △ Less

Submitted 11 August, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.13916 [pdf, other]

Online learning in bandits with predicted context

Authors: Yongyi Guo, Ziping Xu, Susan Murphy

Abstract: We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available.… ▽ More We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-vanishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations. We further demonstrate the benefits of the proposed approach in simulation environments based on synthetic and real digital intervention datasets. △ Less

Submitted 17 March, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2306.11208 [pdf, other]

The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning

Authors: Sarah Rathnam, Sonali Parbhoo, Weiwei Pan, Susan A. Murphy, Finale Doshi-Velez

Abstract: Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of d… ▽ More Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.10983 [pdf, other]

Effect-Invariant Mechanisms for Policy Generalization

Authors: Sorawit Saengkyongam, Niklas Pfister, Predrag Klasnja, Susan Murphy, Jonas Peters

Abstract: Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call f… ▽ More Policy learning is an important component of many real-world learning systems. A major challenge in policy learning is how to adapt efficiently to unseen environments or tasks. Recently, it has been suggested to exploit invariant conditional distributions to learn models that generalize better to unseen environments. However, assuming invariance of entire conditional distributions (which we call full invariance) may be too strong of an assumption in practice. In this paper, we introduce a relaxation of full invariance called effect-invariance (e-invariance for short) and prove that it is sufficient, under suitable assumptions, for zero-shot policy generalization. We also discuss an extension that exploits e-invariance when we have a small sample from the test environment, enabling few-shot policy generalization. Our work does not assume an underlying causal graph or that the data are generated by a structural causal model; instead, we develop testing procedures to test e-invariance directly from data. We present empirical results using simulated data and a mobile health intervention dataset to demonstrate the effectiveness of our approach. △ Less

Submitted 27 June, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2305.18511 [pdf, other]

Contextual Bandits with Budgeted Information Reveal

Authors: Kyra Gan, Esmaeil Keyvanshokooh, Xueqing Liu, Susan Murphy

Abstract: Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to them, which we refer to as pro-treatment actions. In practice, clinicians have a limited budget to encourage patients to take these actions and collect additional… ▽ More Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to them, which we refer to as pro-treatment actions. In practice, clinicians have a limited budget to encourage patients to take these actions and collect additional information. We introduce a novel optimization and learning algorithm to address this problem. This algorithm effectively combines the strengths of two algorithmic approaches in a seamless manner, including 1) an online primal-dual algorithm for deciding the optimal timing to reach out to patients, and 2) a contextual bandit learning algorithm to deliver personalized treatment to the patient. We prove that this algorithm admits a sub-linear regret bound. We illustrate the usefulness of this algorithm on both synthetic and real-world data. △ Less

Submitted 13 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: International Conference on Artificial Intelligence and Statistics, 2024

arXiv:2305.09913 [pdf, other]

Assessing the Impact of Context Inference Error and Partial Observability on RL Methods for Just-In-Time Adaptive Interventions

Authors: Karine Karine, Predrag Klasnja, Susan A. Murphy, Benjamin M. Marlin

Abstract: Just-in-Time Adaptive Interventions (JITAIs) are a class of personalized health interventions developed within the behavioral science community. JITAIs aim to provide the right type and amount of support by iteratively selecting a sequence of intervention options from a pre-defined set of components in response to each individual's time varying state. In this work, we explore the application of re… ▽ More Just-in-Time Adaptive Interventions (JITAIs) are a class of personalized health interventions developed within the behavioral science community. JITAIs aim to provide the right type and amount of support by iteratively selecting a sequence of intervention options from a pre-defined set of components in response to each individual's time varying state. In this work, we explore the application of reinforcement learning methods to the problem of learning intervention option selection policies. We study the effect of context inference error and partial observability on the ability to learn effective policies. Our results show that the propagation of uncertainty from context inferences is critical to improving intervention efficacy as context uncertainty increases, while policy gradient algorithms can provide remarkable robustness to partially observed behavioral state information. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted at UAI 2023

arXiv:2304.05365 [pdf, other]

Did we personalize? Assessing personalization by an online reinforcement learning algorithm using resampling

Authors: Susobhan Ghosh, Raphael Kim, Prasidh Chhabria, Raaz Dwivedi, Predrag Klasnja, Peng Liao, Kelly Zhang, Susan Murphy

Abstract: There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this pro… ▽ More There is a growing interest in using reinforcement learning (RL) to personalize sequences of treatments in digital health to support users in adopting healthier behaviors. Such sequential decision-making problems involve decisions about when to treat and how to treat based on the user's context (e.g., prior activity level, location, etc.). Online RL is a promising data-driven approach for this problem as it learns based on each user's historical responses and uses that knowledge to personalize these decisions. However, to decide whether the RL algorithm should be included in an ``optimized'' intervention for real-world deployment, we must assess the data evidence indicating that the RL algorithm is actually personalizing the treatments to its users. Due to the stochasticity in the RL algorithm, one may get a false impression that it is learning in certain states and using this learning to provide specific treatments. We use a working definition of personalization and introduce a resampling-based methodology for investigating whether the personalization exhibited by the RL algorithm is an artifact of the RL algorithm stochasticity. We illustrate our methodology with a case study by analyzing the data from a physical activity clinical trial called HeartSteps, which included the use of an online RL algorithm. We demonstrate how our approach enhances data-driven truth-in-advertising of algorithm personalization both across all users as well as within specific users in the study. △ Less

Submitted 7 August, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: The first two authors contributed equally

arXiv:2212.00863 [pdf, other]

Modeling Mobile Health Users as Reinforcement Learning Agents

Authors: Eura Shin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

Abstract: Mobile health (mHealth) technologies empower patients to adopt/maintain healthy behaviors in their daily lives, by providing interventions (e.g. push notifications) tailored to the user's needs. In these settings, without intervention, human decision making may be impaired (e.g. valuing near term pleasure over own long term goals). In this work, we formalize this relationship with a framework in w… ▽ More Mobile health (mHealth) technologies empower patients to adopt/maintain healthy behaviors in their daily lives, by providing interventions (e.g. push notifications) tailored to the user's needs. In these settings, without intervention, human decision making may be impaired (e.g. valuing near term pleasure over own long term goals). In this work, we formalize this relationship with a framework in which the user optimizes a (potentially impaired) Markov Decision Process (MDP) and the mHealth agent intervenes on the user's MDP parameters. We show that different types of impairments imply different types of optimal intervention. We also provide analytical and empirical explorations of these differences. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.14297 [pdf, ps, other]

Doubly robust nearest neighbors in factor models

Authors: Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

Abstract: We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for esti… ▽ More We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN. △ Less

Submitted 29 January, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

arXiv:2210.10441 [pdf, other]

Efficient delivery of Robotics Programming educational content using Cloud Robotics

Authors: Sean Murphy, Leonardo Militano, Giovanni Toffetti, Remo Maurer

Abstract: In this paper, we report on our use of cloud-robotics solutions to teach a Robotics Applications Programming course at Zurich University of Applied Sciences (ZHAW). The usage of Kubernetes based cloud computing environment combined with real robots -- turtlebots and Niryo arms -- allowed us to: 1) minimize the set up times required to provide a Robotic Operating System (ROS) simulation and develop… ▽ More In this paper, we report on our use of cloud-robotics solutions to teach a Robotics Applications Programming course at Zurich University of Applied Sciences (ZHAW). The usage of Kubernetes based cloud computing environment combined with real robots -- turtlebots and Niryo arms -- allowed us to: 1) minimize the set up times required to provide a Robotic Operating System (ROS) simulation and development environment to all students independently of their laptop architecture and OS; 2) provide a seamless "simulation to real" experience preserving the exciting experience of writing software interacting with the physical world; and 3) sharing GPUs across multiple student groups, thus using resources efficiently. We describe our requirements, solution design, experience working with the solution in the educational context and areas where it can be further improved. This may be of interest to other educators who may want to replicate our experience. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 7th International Conference on Robotics and Automation Engineering, ICRAE, 2022. arXiv admin note: text overlap with arXiv:2210.03936

arXiv:2210.03936 [pdf, other]

Cloud Native Robotic Applications with GPU Sharing on Kubernetes

Authors: Giovanni Toffetti, Leonardo Militano, Seán Murphy, Remo Maurer, Mark Straub

Abstract: In this paper we discuss our experience in teaching the Robotic Applications Programming course at ZHAW combining the use of a Kubernetes (k8s) cluster and real, heterogeneous, robotic hardware. We discuss the main advantages of our solutions in terms of seamless simulation-to-real experience for students and the main shortcomings we encountered with networking and sharing GPUs to support deep lea… ▽ More In this paper we discuss our experience in teaching the Robotic Applications Programming course at ZHAW combining the use of a Kubernetes (k8s) cluster and real, heterogeneous, robotic hardware. We discuss the main advantages of our solutions in terms of seamless simulation-to-real experience for students and the main shortcomings we encountered with networking and sharing GPUs to support deep learning workloads. We describe the current and foreseen alternatives to avoid these drawbacks in future course editions and propose a more cloud-native approach to deploying multiple robotics applications on a k8s cluster. △ Less

Submitted 31 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Submission accepted at the IROS'22 Cloud Robotics Workshop

arXiv:2208.07406 [pdf, other]

Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-Care

Authors: Anna L. Trella, Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy

Abstract: Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in o… ▽ More Dental disease is one of the most common chronic diseases despite being largely preventable. However, professional advice on optimal oral hygiene practices is often forgotten or abandoned by patients. Therefore patients may benefit from timely and personalized encouragement to engage in oral self-care behaviors. In this paper, we develop an online reinforcement learning (RL) algorithm for use in optimizing the delivery of mobile-based prompts to encourage oral hygiene behaviors. One of the main challenges in developing such an algorithm is ensuring that the algorithm considers the impact of the current action on the effectiveness of future actions (i.e., delayed effects), especially when the algorithm has been made simple in order to run stably and autonomously in a constrained, real-world setting (i.e., highly noisy, sparse data). We address this challenge by designing a quality reward which maximizes the desired health outcome (i.e., high-quality brushing) while minimizing user burden. We also highlight a procedure for optimizing the hyperparameters of the reward by building a simulation environment test bed and evaluating candidates using the test bed. The RL algorithm discussed in this paper will be deployed in Oralytics, an oral self-care app that provides behavioral strategies to boost patient engagement in oral hygiene practices. △ Less

Submitted 14 September, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

arXiv:2206.03944 [pdf, other]

doi 10.3390/a15080255

Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-implementation Guidelines

Authors: Anna L. Trella, Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, Susan A. Murphy

Abstract: Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurat… ▽ More Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (Predictability, Computability, Stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning (Yu and Kumbier, 2020), to the design of RL algorithms for the digital interventions setting. Further, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We illustrate the use of the PCS framework for designing an RL algorithm for Oralytics, a mobile health study aiming to improve users' tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022. △ Less

Submitted 18 August, 2022; v1 submitted 8 June, 2022; originally announced June 2022.

arXiv:2203.00097 [pdf]

doi 10.1016/j.ejor.2022.01.046

Estimating causal effects with optimization-based methods: A review and empirical comparison

Authors: Martin Cousineau, Vedat Verter, Susan A. Murphy, Joelle Pineau

Abstract: In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given. To achieve this balance, there exist a wide variety of metho… ▽ More In the absence of randomized controlled and natural experiments, it is necessary to balance the distributions of (observable) covariates of the treated and control groups in order to obtain an unbiased estimate of a causal effect of interest; otherwise, a different effect size may be estimated, and incorrect recommendations may be given. To achieve this balance, there exist a wide variety of methods. In particular, several methods based on optimization models have been recently proposed in the causal inference literature. While these optimization-based methods empirically showed an improvement over a limited number of other causal inference methods in their relative ability to balance the distributions of covariates and to estimate causal effects, they have not been thoroughly compared to each other and to other noteworthy causal inference methods. In addition, we believe that there exist several unaddressed opportunities that operational researchers could contribute with their advanced knowledge of optimization, for the benefits of the applied researchers that use causal inference tools. In this review paper, we present an overview of the causal inference literature and describe in more detail the optimization-based causal inference methods, provide a comparative analysis of the prevailing optimization-based methods, and discuss opportunities for new methods. △ Less

Submitted 28 February, 2022; originally announced March 2022.

Comments: In Press, Corrected Proof

Journal ref: European Journal of Operational Research, 2022, 14 pages

arXiv:2202.07098 [pdf, ps, other]

Statistical Inference After Adaptive Sampling for Longitudinal Data

Authors: Kelly W. Zhang, Lucas Janson, Susan A. Murphy

Abstract: Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "p… ▽ More Online reinforcement learning and other adaptive sampling algorithms are increasingly used in digital intervention experiments to optimize treatment delivery for users over time. In this work, we focus on longitudinal user data collected by a large class of adaptive sampling algorithms that are designed to optimize treatment decisions online using accruing data from multiple users. Combining or "pooling" data across users allows adaptive sampling algorithms to potentially learn faster. However, by pooling, these algorithms induce dependence between the sampled user data trajectories; we show that this can cause standard variance estimators for i.i.d. data to underestimate the true variance of common estimators on this data type. We develop novel methods to perform a variety of statistical analyses on such adaptively sampled data via Z-estimation. Specifically, we introduce the \textit{adaptive} sandwich variance estimator, a corrected sandwich estimator that leads to consistent variance estimates under adaptive sampling. Additionally, to prove our results we develop novel theoretical tools for empirical processes on non-i.i.d., adaptively sampled longitudinal data which may be of independent interest. This work is motivated by our efforts in designing experiments in which online reinforcement learning algorithms optimize treatment decisions, yet statistical inference is essential for conducting analyses after experiments conclude. △ Less

Submitted 19 April, 2023; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: Fixing typos

arXiv:2202.06891 [pdf, ps, other]

Counterfactual inference in sequential experiments

Authors: Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

Abstract: We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumpt… ▽ More We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference guarantees for the counterfactual mean at the smallest possible scale -- mean outcome under different treatments for each unit and each time -- with minimal assumptions on the adaptive treatment policy. Without any structural assumptions on the counterfactual means, this challenging task is infeasible due to more unknowns than observed data points. To make progress, we introduce a latent factor model over the counterfactual means that serves as a non-parametric generalization of the non-linear mixed effects model and the bilinear latent factor model considered in prior works. For estimation, we use a non-parametric method, namely a variant of nearest neighbors, and establish a non-asymptotic high probability error bound for the counterfactual mean for each unit and each time. Under regularity conditions, this bound leads to asymptotically valid confidence intervals for the counterfactual mean as the number of units and time points grows to $\infty$ together at suitable rates. We illustrate our theory via several simulations and a case study involving data from a mobile health clinical trial HeartSteps. △ Less

Submitted 8 June, 2025; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: Accepted at the Annals of Statistics

arXiv:2110.02260 [pdf]

An Overview of the Drone Open-Source Ecosystem

Authors: John Glossner, Samantha Murphy, Daniel Iancu

Abstract: Unmanned aerial systems capable of beyond visual line of sight operation can be organized into a top-down hierarchy of layers including flight supervision, command and control, simulation of systems, operating systems, and physical hardware. Flight supervision includes unmanned air traffic management, flight planning, authorization, and remote identification. Command and control ensure drones can… ▽ More Unmanned aerial systems capable of beyond visual line of sight operation can be organized into a top-down hierarchy of layers including flight supervision, command and control, simulation of systems, operating systems, and physical hardware. Flight supervision includes unmanned air traffic management, flight planning, authorization, and remote identification. Command and control ensure drones can be piloted safely. Simulation of systems concerns how drones may react to different environments and how changing conditions and information provide input to a piloting system. Electronic hardware controlling drone operation is typically accessed using an operating system. Each layer in the hierarchy has an ecosystem of open-source solutions. In this brief survey we describe representative open-source examples for each level of the hierarchy. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2109.08134 [pdf, other]

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Authors: Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez

Abstract: In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to c… ▽ More In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: ICML Workshop on Reinforcement Learning Theory 2021

arXiv:2107.09949 [pdf, other]

Online structural kernel selection for mobile health

Authors: Eura Shin, Pedja Klasnja, Susan Murphy, Finale Doshi-Velez

Abstract: Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel composition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that… ▽ More Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel composition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that the kernels themselves are meaningful for an mHealth prediction goal. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Workshop paper in ICML IMLH 2021

arXiv:2104.14074 [pdf, other]

Statistical Inference with M-Estimators on Adaptively Collected Data

Authors: Kelly W. Zhang, Lucas Janson, Susan A. Murphy

Abstract: Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence interval… ▽ More Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e.g., comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators -- which includes estimators based on empirical risk minimization as well as maximum likelihood -- on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets. △ Less

Submitted 19 November, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Journal ref: Advances in Neural Information Processing Systems, 2021

arXiv:2012.11646 [pdf, other]

Fast Physical Activity Suggestions: Efficient Hyperparameter Learning in Mobile Health

Authors: Marianne Menictas, Sabina Tomkins, Susan Murphy

Abstract: Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth)… ▽ More Users can be supported to adopt healthy behaviors, such as regular physical activity, via relevant and timely suggestions on their mobile devices. Recently, reinforcement learning algorithms have been found to be effective for learning the optimal context under which to provide suggestions. However, these algorithms are not necessarily designed for the constraints posed by mobile health (mHealth) settings, that they be efficient, domain-informed and computationally affordable. We propose an algorithm for providing physical activity suggestions in mHealth settings. Using domain-science, we formulate a contextual bandit algorithm which makes use of a linear mixed effects model. We then introduce a procedure to efficiently perform hyper-parameter updating, using far less computational resources than competing approaches. Not only is our approach computationally efficient, it is also easily implemented with closed form matrix algebraic updates and we show improvements over state of the art approaches both in speed and accuracy of up to 99% and 56% respectively. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: Neurips 2020 workshop: Machine Learning in Mobile Health. arXiv admin note: substantial text overlap with arXiv:2003.12881

arXiv:2008.03869 [pdf]

doi 10.1038/s41598-021-84781-x

Individualized Prediction of COVID-19 Adverse outcomes with MLHO

Authors: Hossein Estiri, Zachary H. Strasser, Shawn N. Murphy

Abstract: We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction… ▽ More We developed MLHO (pronounced as melo), an end-to-end Machine Learning framework that leverages iterative feature and algorithm selection to predict Health Outcomes. MLHO implements iterative sequential representation mining, and feature and model selection, for predicting the patient-level risk of hospitalization, ICU admission, need for mechanical ventilation, and death. It bases this prediction on data from patients' past medical records (before their COVID-19 infection). MLHO's architecture enables a parallel and outcome-oriented model calibration, in which different statistical learning algorithms and vectors of features are simultaneously tested to improve the prediction of health outcomes. Using clinical and demographic data from a large cohort of over 13,000 COVID-19-positive patients, we modeled the four adverse outcomes utilizing about 600 features representing patients' pre-COVID health records and demographics. The mean AUC ROC for mortality prediction was 0.91, while the prediction performance ranged between 0.80 and 0.81 for the ICU, hospitalization, and ventilation. We broadly describe the clusters of features that were utilized in modeling and their relative influence for predicting each outcome. Our results demonstrated that while demographic variables (namely age) are important predictors of adverse outcomes after a COVID-19 infection, the incorporation of the past clinical records are vital for a reliable prediction model. As the COVID-19 pandemic unfolds around the world, adaptable and interpretable machine learning frameworks (like MLHO) are crucial to improve our readiness for confronting the potential future waves of COVID-19, as well as other novel infectious diseases that may emerge. △ Less

Submitted 29 December, 2020; v1 submitted 9 August, 2020; originally announced August 2020.

arXiv:2008.01571 [pdf, other]

IntelligentPooling: Practical Thompson Sampling for mHealth

Authors: Sabina Tomkins, Peng Liao, Predrag Klasnja, Susan Murphy

Abstract: In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobi… ▽ More In mobile health (mHealth) smart devices deliver behavioral treatments repeatedly over time to a user with the goal of helping the user adopt and maintain healthy behaviors. Reinforcement learning appears ideal for learning how to optimally make these sequential treatment decisions. However, significant challenges must be overcome before reinforcement learning can be effectively deployed in a mobile healthcare setting. In this work we are concerned with the following challenges: 1) individuals who are in the same context can exhibit differential response to treatments 2) only a limited amount of data is available for learning on any one individual, and 3) non-stationary responses to treatment. To address these challenges we generalize Thompson-Sampling bandit algorithms to develop IntelligentPooling. IntelligentPooling learns personalized treatment policies thus addressing challenge one. To address the second challenge, IntelligentPooling updates each user's degree of personalization while making use of available data on other users to speed up learning. Lastly, IntelligentPooling allows responsivity to vary as a function of a user's time since beginning treatment, thus addressing challenge three. We show that IntelligentPooling achieves an average of 26% lower regret than state-of-the-art. We demonstrate the promise of this approach and its ability to learn from even a small group of users in a live clinical trial. △ Less

Submitted 12 December, 2020; v1 submitted 31 July, 2020; originally announced August 2020.

Comments: arXiv admin note: text overlap with arXiv:2002.09971

arXiv:2005.05880 [pdf]

The Micro-Randomized Trial for Developing Digital Interventions: Experimental Design Considerations

Authors: Ashley E. Walton, Linda M. Collins, Predrag Klasnja, Inbal Nahum-Shani, Mashfiqui Rabbi, Maureen A. Walton, Susan A. Murphy

Abstract: Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted such as weekly, daily, or even many times a day. This high intensity of adaptation is facilitated by the ability of digital technology to continuously collect information about an individual's current context and deliver treatments adapted to this… ▽ More Just-in-time adaptive interventions (JITAIs) are time-varying adaptive interventions that use frequent opportunities for the intervention to be adapted such as weekly, daily, or even many times a day. This high intensity of adaptation is facilitated by the ability of digital technology to continuously collect information about an individual's current context and deliver treatments adapted to this information. The micro-randomized trial (MRT) has emerged for use in informing the construction of JITAIs. MRTs operate in, and take advantage of, the rapidly time-varying digital intervention environment. MRTs can be used to address research questions about whether and under what circumstances particular components of a JITAI are effective, with the ultimate objective of developing effective and efficient components. The purpose of this article is to clarify why, when, and how to use MRTs; to highlight elements that must be considered when designing and implementing an MRT; and to discuss the possibilities this emerging optimization trial design offers for future research in the behavioral sciences, education, and other fields. We briefly review key elements of JITAIs, and then describe three case studies of MRTs, each of which highlights research questions that can be addressed using the MRT and experimental design considerations that might arise. We also discuss a variety of considerations that go into planning and designing an MRT, using the case studies as examples. △ Less

Submitted 23 April, 2020; originally announced May 2020.

MSC Class: 62P15

arXiv:2004.06230 [pdf, other]

Power Constrained Bandits

Authors: Jiayu Yao, Emma Brunskill, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

Abstract: Contextual bandits often provide simple and effective personalization in decision making problems, making them popular tools to deliver personalized interventions in mobile health as well as other health applications. However, when bandits are deployed in the context of a scientific study -- e.g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to person… ▽ More Contextual bandits often provide simple and effective personalization in decision making problems, making them popular tools to deliver personalized interventions in mobile health as well as other health applications. However, when bandits are deployed in the context of a scientific study -- e.g. a clinical trial to test if a mobile health intervention is effective -- the aim is not only to personalize for an individual, but also to determine, with sufficient statistical power, whether or not the system's intervention is effective. It is essential to assess the effectiveness of the intervention before broader deployment for better resource allocation. The two objectives are often deployed under different model assumptions, making it hard to determine how achieving the personalization and statistical power affect each other. In this work, we develop general meta-algorithms to modify existing algorithms such that sufficient power is guaranteed while still improving each user's well-being. We also demonstrate that our meta-algorithms are robust to various model mis-specifications possibly appearing in statistical studies, thus providing a valuable tool to study designers. △ Less

Submitted 27 July, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

Comments: Accepted at MLHC 2021

Showing 1–50 of 65 results for author: Murphy, S