Skip to main content

Showing 1–50 of 74 results for author: Van Der Laan, M J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.14675  [pdf, ps, other

    stat.AP

    Semi-parametric efficient estimation of small genetic effects in large-scale population cohorts

    Authors: Olivier Labayle, Breeshey Roskams-Hieter, Joshua Slaughter, Kelsey Tetley-Campbell, Mark J. van der Laan, Chris P. Ponting, Sjoerd Viktor Beentjes, Ava Khamseh

    Abstract: Population genetics seeks to quantify DNA variant associations with traits or diseases, as well as interactions among variants and with environmental factors. Computing millions of estimates in large cohorts in which small effect sizes are expected, necessitates minimising model-misspecification bias to control false discoveries. We present TarGene, a unified statistical workflow for the semi-para… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: 31 pages + appendix, 5 figures

  2. arXiv:2504.11740  [pdf, other

    stat.ME

    A cautionary note for plasmode simulation studies in the setting of causal inference

    Authors: Pamela A Shaw, Susan Gruber, Brian D. Williamson, Rishi Desai, Susan M. Shortreed, Chloe Krakauer, Jennifer C. Nelson, Mark J. van der Laan

    Abstract: Plasmode simulation has become an important tool for evaluating the operating characteristics of different statistical methods in complex settings, such as pharmacoepidemiological studies of treatment effectiveness using electronic health records (EHR) data. These studies provide insight into how estimator performance is impacted by challenges including rare events, small sample size, etc., that c… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 55 pages, 6 tables, 2 figures, 8 supplemental tables, 4 supplemental figures

  3. arXiv:2503.22284  [pdf, other

    stat.ME

    Powering RCTs for marginal effects with GLMs using prognostic score adjustment

    Authors: Emilie Højbjerre-Frandsen, Mark J. van der Laan, Alejandro Schuler

    Abstract: In randomized clinical trials (RCTs), the accurate estimation of marginal treatment effects is crucial for determining the efficacy of interventions. Enhancing the statistical power of these analyses is a key objective for statisticians. The increasing availability of historical data from registries, prior trials, and health records presents an opportunity to improve trial efficiency. However, man… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: 41 pages, 7 figures

  4. arXiv:2412.15012  [pdf, other

    stat.ME

    Assessing treatment effects in observational data with missing confounders: A comparative study of practical doubly-robust and traditional missing data methods

    Authors: Brian D. Williamson, Chloe Krakauer, Eric Johnson, Susan Gruber, Bryan E. Shepherd, Mark J. van der Laan, Thomas Lumley, Hana Lee, Jose J. Hernandez Munoz, Fengyu Zhao, Sarah K. Dutcher, Rishi Desai, Gregory E. Simon, Susan M. Shortreed, Jennifer C. Nelson, Pamela A. Shaw

    Abstract: In pharmacoepidemiology, safety and effectiveness are frequently evaluated using readily available administrative and electronic health records data. In these settings, detailed confounder data are often not available in all data sources and therefore missing on a subset of individuals. Multiple imputation (MI) and inverse-probability weighting (IPW) are go-to analytical methods to handle missing… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 142 pages (27 main, 115 supplemental); 6 figures, 2 tables

  5. arXiv:2408.09060  [pdf

    stat.AP stat.ME

    [Invited Discussion] Randomization Tests to Address Disruptions in Clinical Trials: A Report from the NISS Ingram Olkin Forum Series on Unplanned Clinical Trial Disruptions

    Authors: Rachael V. Phillips, Mark J. van der Laan

    Abstract: Disruptions in clinical trials may be due to external events like pandemics, warfare, and natural disasters. Resulting complications may lead to unforeseen intercurrent events (events that occur after treatment initiation and affect the interpretation of the clinical question of interest or the existence of the measurements associated with it). In Uschner et al. (2023), several example clinical tr… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: This article is an un-refereed, Authors Original Version

  6. arXiv:2404.09847  [pdf, other

    stat.ML cs.CY cs.LG stat.ME

    Statistical learning for constrained functional parameters in infinite-dimensional models with applications in fair machine learning

    Authors: Razieh Nabi, Nima S. Hejazi, Mark J. van der Laan, David Benkeser

    Abstract: Constrained learning has become increasingly important, especially in the realm of algorithmic fairness and machine learning. In these settings, predictive models are developed specifically to satisfy pre-defined notions of fairness. Here, we study the general problem of constrained statistical machine learning through a statistical functional lens. We consider learning a function-valued parameter… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  7. arXiv:2404.01736  [pdf, other

    stat.ME

    Nonparametric efficient causal estimation of the intervention-specific expected number of recurrent events with continuous-time targeted maximum likelihood and highly adaptive lasso estimation

    Authors: Helene C. W. Rytgaard, Mark J. van der Laan

    Abstract: Longitudinal settings involving outcome, competing risks and censoring events occurring and recurring in continuous time are common in medical research, but are often analyzed with methods that do not allow for taking post-baseline information into account. In this work, we define statistical and causal target parameters via the g-computation formula by carrying out interventions directly on the p… ▽ More

    Submitted 11 April, 2025; v1 submitted 2 April, 2024; originally announced April 2024.

  8. arXiv:2310.19197  [pdf, other

    stat.CO

    concrete: Targeted Estimation of Survival and Competing Risks in Continuous Time

    Authors: David Chen, Helene C. W. Rytgaard, Edwin C. H. Fong, Jens M. Tarp, Maya L. Petersen, Mark J. van der Laan, Thomas A. Gerds

    Abstract: This article introduces the R package concrete, which implements a recently developed targeted maximum likelihood estimator (TMLE) for the cause-specific absolute risks of time-to-event outcomes measured in continuous time. Cross-validated Super Learner machine learning ensembles are used to estimate propensity scores and conditional cause-specific hazards, which are then targeted to produce robus… ▽ More

    Submitted 20 March, 2025; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: 18 pages, 4 figures, submitted to the R Journal

  9. arXiv:2309.16099  [pdf, other

    math.ST stat.ME stat.ML

    Nonparametric estimation of a covariate-adjusted counterfactual treatment regimen response curve

    Authors: Ashkan Ertefaie, Luke Duttweiler, Brent A. Johnson, Mark J. van der Laan

    Abstract: Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regi… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  10. arXiv:2306.07736  [pdf, other

    stat.ME

    An Approach to Nonparametric Inference on the Causal Dose Response Function

    Authors: Aaron Hudson, Elvin H. Geng, Thomas A. Odeny, Elizabeth A. Bukusi, Maya L. Petersen, Mark J. van der Laan

    Abstract: The causal dose response curve is commonly selected as the statistical parameter of interest in studies where the goal is to understand the effect of a continuous exposure on an outcome.Most of the available methodology for statistical inference on the dose-response function in the continuous exposure setting requires strong parametric assumptions on the probability distribution. Such parametric a… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: 39 pages, 5 figures

  11. arXiv:2305.01849  [pdf, other

    stat.ME

    Semiparametric Discovery and Estimation of Interaction in Mixed Exposures using Stochastic Interventions

    Authors: David B. McCoy, Alan E. Hubbard, Alejandro Schuler, Mark J. van der Laan

    Abstract: This study introduces a nonparametric definition of interaction and provides an approach to both interaction discovery and efficient estimation of this parameter. Using stochastic shift interventions and ensemble machine learning, our approach identifies and quantifies interaction effects through a model-independent target parameter, estimated via targeted maximum likelihood and cross-validation.… ▽ More

    Submitted 28 June, 2024; v1 submitted 2 May, 2023; originally announced May 2023.

  12. arXiv:2301.12029  [pdf, other

    stat.ML cs.LG stat.ME

    Multi-task Highly Adaptive Lasso

    Authors: Ivana Malenica, Rachael V. Phillips, Daniel Lazzareschi, Jeremy R. Coyle, Romain Pirracchio, Mark J. van der Laan

    Abstract: We propose a novel, fully nonparametric approach for the multi-task learning, the Multi-task Highly Adaptive Lasso (MT-HAL). MT-HAL simultaneously learns features, samples and task associations important for the common model, while imposing a shared sparse structure among similar tasks. Given multiple tasks, our approach automatically finds a sparse sharing structure. The proposed MTL algorithm at… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  13. arXiv:2212.02422  [pdf, other

    stat.ME stat.AP stat.ML

    Adaptive Sequential Surveillance with Network and Temporal Dependence

    Authors: Ivana Malenica, Jeremy R. Coyle, Mark J. van der Laan, Maya L. Petersen

    Abstract: Strategic test allocation plays a major role in the control of both emerging and existing pandemics (e.g., COVID-19, HIV). Widespread testing supports effective epidemic control by (1) reducing transmission via identifying cases, and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the tr… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  14. arXiv:2211.14671  [pdf, other

    stat.ME stat.AP

    Efficient Targeted Learning of Heterogeneous Treatment Effects for Multiple Subgroups

    Authors: Waverly Wei, Maya Petersen, Mark J van der Laan, Zeyu Zheng, Chong Wu, Jingshen Wang

    Abstract: In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treat… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: Accepted by Biometrics 2022

  15. Revisiting the propensity score's central role: Towards bridging balance and efficiency in the era of causal machine learning

    Authors: Nima S. Hejazi, Mark J. van der Laan

    Abstract: About forty years ago, in a now--seminal contribution, Rosenbaum & Rubin (1983) introduced a critical characterization of the propensity score as a central quantity for drawing causal inferences in observational study settings. In the decades since, much progress has been made across several research fronts in causal inference, notably including the re-weighting and matching paradigms. Focusing on… ▽ More

    Submitted 30 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: Accepted for publication in a forthcoming special issue of Observational Studies

    Journal ref: Observational Studies, 2023

  16. arXiv:2205.08643  [pdf

    stat.AP

    Targeted learning: Towards a future informed by real-world evidence

    Authors: Susan Gruber, Rachael V. Phillips, Hana Lee, Martin Ho, John Concato, Mark J. van der Laan

    Abstract: The 21st Century Cures Act of 2016 includes a provision for the U.S. Food and Drug Administration (FDA) to evaluate the potential use of real-world evidence (RWE) to support new indications for use for previously approved drugs, and to satisfy post-approval study requirements. Extracting reliable evidence from real-world data (RWD) is often complicated by a lack of treatment randomization, potenti… ▽ More

    Submitted 13 June, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 34 pages (25 pages main paper + references, 9 page Appendix), 6 figures version 2 corrected minor typos, numbering errors, etc

  17. arXiv:2205.05777  [pdf, other

    stat.ME

    Efficient estimation of modified treatment policy effects based on the generalized propensity score

    Authors: Nima S. Hejazi, David Benkeser, Iván Díaz, Mark J. van der Laan

    Abstract: Continuous treatments have posed a significant challenge for causal inference, both in the formulation and identification of scientifically meaningful effects and in their robust estimation. Traditionally, focus has been placed on techniques applicable to binary or categorical treatments with few levels, allowing for the application of propensity score-based methodology with relative ease. Efforts… ▽ More

    Submitted 28 June, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

  18. A Flexible Approach for Predictive Biomarker Discovery

    Authors: Philippe Boileau, Nina Ting Qi, Mark J. van der Laan, Sandrine Dudoit, Ning Leng

    Abstract: An endeavor central to precision medicine is predictive biomarker discovery; they define patient subpopulations which stand to benefit most, or least, from a given treatment. The identification of these biomarkers is often the byproduct of the related but fundamentally different task of treatment rule estimation. Using treatment rule estimation methods to identify predictive biomarkers in clinical… ▽ More

    Submitted 1 June, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

  19. arXiv:2204.06139  [pdf

    stat.ME stat.AP stat.CO

    Practical considerations for specifying a super learner

    Authors: Rachael V. Phillips, Mark J. van der Laan, Hana Lee, Susan Gruber

    Abstract: Common tasks encountered in epidemiology, including disease incidence estimation and causal inference, rely on predictive modeling. Constructing a predictive model can be thought of as learning a prediction function, i.e., a function that takes as input covariate data and outputs a predicted value. Many strategies for learning these functions from data are available, from parametric regressions to… ▽ More

    Submitted 14 March, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: A revised version of this article, which incorporates several modifications based on referees' suggestions, has been published in the International Journal of Epidemiology by Oxford University Press

    Journal ref: International Journal of Epidemiology, Volume 52, Issue 4, August 2023, Pages 1276-1285

  20. arXiv:2110.12112  [pdf, ps, other

    math.ST cs.LG stat.ML

    Why Machine Learning Cannot Ignore Maximum Likelihood Estimation

    Authors: Mark J. van der Laan, Sherri Rose

    Abstract: The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the great… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: 30 pages. Forthcoming as a chapter in the Handbook of Matching and Weighting in Causal Inference

  21. arXiv:2110.09633  [pdf, other

    stat.ME

    Defining and Estimating Effects in Cluster Randomized Trials: A Methods Comparison

    Authors: Alejandra Benitez, Maya L. Petersen, Mark J. van der Laan, Nicole Santos, Elizabeth Butrick, Dilys Walker, Rakesh Ghosh, Phelgona Otieno, Peter Waiswa, Laura B. Balzer

    Abstract: Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the causal effect of interest (e.g., at the individual-level or at the cluster-level). Sec… ▽ More

    Submitted 3 May, 2023; v1 submitted 18 October, 2021; originally announced October 2021.

  22. arXiv:2109.14048  [pdf, other

    stat.ME

    Evaluating the Robustness of Targeted Maximum Likelihood Estimators via Realistic Simulations in Nutrition Intervention Trials

    Authors: Haodong Li, Sonali Rosete, Jeremy Coyle, Rachael V. Phillips, Nima S. Hejazi, Ivana Malenica, Benjamin F. Arnold, Jade Benjamin-Chung, Andrew Mertens, John M. Colford Jr, Mark J. van der Laan, Alan E. Hubbard

    Abstract: Several recently developed methods have the potential to harness machine learning in the pursuit of target quantities inspired by causal inference, including inverse weighting, doubly robust estimating equations and substitution estimators like targeted maximum likelihood estimation. There are even more recent augmentations of these procedures that can increase robustness, by adding a layer of cro… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

  23. arXiv:2109.10452  [pdf, other

    stat.ML cs.LG

    Personalized Online Machine Learning

    Authors: Ivana Malenica, Rachael V. Phillips, Romain Pirracchio, Antoine Chambaz, Alan Hubbard, Mark J. van der Laan

    Abstract: In this work, we introduce the Personalized Online Super Learner (POSL) -- an online ensembling algorithm for streaming data whose optimization procedure accommodates varying degrees of personalization. Namely, POSL optimizes predictions with respect to baseline covariates, so personalization can vary from completely individualized (i.e., optimization with respect to baseline covariate subject ID)… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  24. arXiv:2107.01537  [pdf, other

    stat.ME

    One-step TMLE for targeting cause-specific absolute risks and survival curves

    Authors: Helene C. W. Rytgaard, Mark J. van der Laan

    Abstract: This paper considers one-step targeted maximum likelihood estimation method for general competing risks and survival analysis settings where event times take place on the positive real line R+ and are subject to right-censoring. Our interest is overall in the effects of baseline treatment decisions, static, dynamic or stochastic, possibly confounded by pre-treatment covariates. We point out two ov… ▽ More

    Submitted 1 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: 21 pages (including appendix), 1 figure, 5 tables

  25. arXiv:2105.02088  [pdf, other

    math.ST stat.ME

    Continuous-time targeted minimum loss-based estimation of intervention-specific mean outcomes

    Authors: Helene C. Rytgaard, Thomas A. Gerds, Mark J. van der Laan

    Abstract: This paper studies the generalization of the targeted minimum loss-based estimation (TMLE) framework to estimation of effects of time-varying interventions in settings where both interventions, covariates, and outcome can happen at subject-specific time-points on an arbitrarily fine time-scale. TMLE is a general template for constructing asymptotically linear substitution estimators for smooth low… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 27 pages (excluding supplementary material), 1 figures

  26. Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions

    Authors: Philippe Boileau, Nima S. Hejazi, Mark J. van der Laan, Sandrine Dudoit

    Abstract: The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimension… ▽ More

    Submitted 6 May, 2022; v1 submitted 18 February, 2021; originally announced February 2021.

    Comments: 32 pages, 8 figures; updated contents of section 3, fixed typos

  27. arXiv:2102.00102  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Adaptive Sequential Design for a Single Time-Series

    Authors: Ivana Malenica, Aurelien Bibaut, Mark J. van der Laan

    Abstract: The current work is motivated by the need for robust statistical methods for precision medicine; as such, we address the need for statistical methods that provide actionable inference for a single unit at any point in time. We aim to learn an optimal, unknown choice of the controlled components of the design in order to optimize the expected outcome; with that, we adapt the randomization mechanism… ▽ More

    Submitted 1 July, 2021; v1 submitted 29 January, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:1809.00734

  28. Nonparametric causal mediation analysis for stochastic interventional (in)direct effects

    Authors: Nima S. Hejazi, Kara E. Rudolph, Mark J. van der Laan, Iván Díaz

    Abstract: Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition o… ▽ More

    Submitted 11 January, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

    Journal ref: Biostatistics, 2022

  29. arXiv:2006.08675  [pdf, ps, other

    stat.AP

    Targeted Maximum Likelihood Estimation of Community-based Causal Effect of Community-Level Stochastic Interventions

    Authors: Chi Zhang, Jennifer Ahern, Mark J. van der Laan

    Abstract: Unlike the commonly used parametric regression models such as mixed models, that can easily violate the required statistical assumptions and result in invalid statistical inference, target maximum likelihood estimation allows more realistic data-generative models and provides double-robust, semi-parametric and efficient estimators. Target maximum likelihood estimators (TMLEs) for the causal effect… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: 20 pages. arXiv admin note: substantial text overlap with arXiv:2006.08553

  30. arXiv:2006.08553  [pdf, other

    stat.AP stat.CO

    tmleCommunity: A R Package Implementing Target Maximum Likelihood Estimation for Community-level Data

    Authors: Chi Zhang, Jennifer Ahern, Mark J. van der Laan, Oleg Sofrygin

    Abstract: Over the past years, many applications aim to assess the causal effect of treatments assigned at the community level, while data are still collected at the individual level among individuals of the community. In many cases, one wants to evaluate the effect of a stochastic intervention on the community, where all communities in the target population receive probabilistically assigned treatments bas… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: 42 pages

  31. arXiv:2006.07333  [pdf

    stat.ME stat.ML

    Targeting Learning: Robust Statistics for Reproducible Research

    Authors: Jeremy R. Coyle, Nima S. Hejazi, Ivana Malenica, Rachael V. Phillips, Benjamin F. Arnold, Andrew Mertens, Jade Benjamin-Chung, Weixin Cai, Sonali Dayal, John M. Colford Jr., Alan E. Hubbard, Mark J. van der Laan

    Abstract: Targeted Learning is a subfield of statistics that unifies advances in causal inference, machine learning and statistical theory to help answer scientifically impactful questions with statistical confidence. Targeted Learning is driven by complex problems in data science and has been implemented in a diversity of real-world scenarios: observational studies with missing treatments and outcomes, per… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 25 pages, 3 figures

    MSC Class: 62A01 ACM Class: G.3

  32. arXiv:2006.03632  [pdf, other

    cs.LG stat.ML

    Rate-adaptive model selection over a collection of black-box contextual bandit algorithms

    Authors: Aurélien F. Bibaut, Antoine Chambaz, Mark J. van der Laan

    Abstract: We consider the model selection task in the stochastic contextual bandit setting. Suppose we are given a collection of base contextual bandit algorithms. We provide a master algorithm that combines them and achieves the same performance, up to constants, as the best base algorithm would, if it had been run on its own. Our approach only requires that each algorithm satisfy a high probability regret… ▽ More

    Submitted 5 June, 2020; originally announced June 2020.

  33. arXiv:2005.11303  [pdf, other

    stat.ME math.ST stat.ML

    Nonparametric inverse probability weighted estimators based on the highly adaptive lasso

    Authors: Ashkan Ertefaie, Nima S. Hejazi, Mark J. van der Laan

    Abstract: Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their ease of use, these estimators require the correct s… ▽ More

    Submitted 3 July, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

  34. Efficient nonparametric inference on the effects of stochastic interventions under two-phase sampling, with applications to vaccine efficacy trials

    Authors: Nima S. Hejazi, Mark J. van der Laan, Holly E. Janes, Peter B. Gilbert, David C. Benkeser

    Abstract: The advent and subsequent widespread availability of preventive vaccines has altered the course of public health over the past century. Despite this success, effective vaccines to prevent many high-burden diseases, including HIV, have been slow to develop. Vaccine development can be aided by the identification of immune response markers that serve as effective surrogates for clinically significant… ▽ More

    Submitted 3 April, 2020; v1 submitted 30 March, 2020; originally announced March 2020.

    Journal ref: Biometrics, 2020

  35. arXiv:2003.02873  [pdf, other

    cs.LG stat.ML

    Generalized Policy Elimination: an efficient algorithm for Nonparametric Contextual Bandits

    Authors: Aurélien F. Bibaut, Antoine Chambaz, Mark J. van der Laan

    Abstract: We propose the Generalized Policy Elimination (GPE) algorithm, an oracle-efficient contextual bandit (CB) algorithm inspired by the Policy Elimination algorithm of \cite{dudik2011}. We prove the first regret optimality guarantee theorem for an oracle-efficient CB algorithm competing against a nonparametric class with infinite VC-dimension. Specifically, we show that GPE is regret-optimal (up to lo… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

  36. Non-parametric efficient causal mediation with intermediate confounders

    Authors: Iván Díaz, Nima S. Hejazi, Kara E. Rudolph, Mark J. van der Laan

    Abstract: Interventional effects for mediation analysis were proposed as a solution to the lack of identifiability of natural (in)direct effects in the presence of a mediator-outcome confounder affected by exposure. We present a theoretical and computational study of the properties of the interventional (in)direct effect estimands based on the efficient influence fucntion (EIF) in the non-parametric statist… ▽ More

    Submitted 29 May, 2020; v1 submitted 20 December, 2019; originally announced December 2019.

    Journal ref: Biometrika, 2020

  37. arXiv:1912.06675  [pdf, other

    stat.ML cs.LG

    Conditional Super Learner

    Authors: Gilmer Valdes, Yannet Interian, Efstathios D. Gennatas Mark J. Van der Laan

    Abstract: In this article we consider the Conditional Super Learner (CSL), an algorithm which selects the best model candidate from a library conditional on the covariates. The CSL expands the idea of using cross-validation to select the best model and merges it with meta learning. Here we propose a specific algorithm that finds a local minimum to the problem posed, proof that it converges at a rate faster… ▽ More

    Submitted 13 December, 2019; originally announced December 2019.

  38. arXiv:1912.06292  [pdf, other

    cs.LG stat.ME stat.ML

    More Efficient Off-Policy Evaluation through Regularized Targeted Learning

    Authors: Aurélien F. Bibaut, Ivana Malenica, Nikos Vlassis, Mark J. van der Laan

    Abstract: We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistica… ▽ More

    Submitted 12 December, 2019; originally announced December 2019.

    Comments: We are uploading the full paper with the appendix as of 12/12/2019, as we noticed that, unlike the main text, the appendix has not been made available on PMLR's website. The version of the appendix in this document is the same that we have been sending by email since June 2019 to readers who solicited it

    Journal ref: Proceedings of the 36th International Conference on Machine Learning, PMLR 97:654-663, 2019

  39. arXiv:1908.05607  [pdf, other

    math.ST stat.ME

    Efficient Estimation of Pathwise Differentiable Target Parameters with the Undersmoothed Highly Adaptive Lasso

    Authors: Mark J. van der Laan, David Benkeser, Weixin Cai

    Abstract: We consider estimation of a functional parameter of a realistically modeled data distribution based on observing independent and identically distributed observations. We define an $m$-th order Spline Highly Adaptive Lasso Minimum Loss Estimator (Spline HAL-MLE) of a functional parameter that is defined by minimizing the empirical risk function over an $m$-th order smoothness class of functions. We… ▽ More

    Submitted 2 July, 2021; v1 submitted 14 August, 2019; originally announced August 2019.

  40. arXiv:1905.13414  [pdf, other

    stat.ME

    Targeted Estimation of L2 Distance Between Densities and its Application to Geo-spatial Data

    Authors: George Shan, Mark J. van der Laan

    Abstract: We examine the integrated squared difference, also known as the L2 distance (L2D), between two probability densities. Such a distance metric allows for comparison of differences between pairs of distributions or changes in a distribution over time. We propose a targeted maximum likelihood estimator for this parameter based on samples of independent and identically distributed observations from bot… ▽ More

    Submitted 31 May, 2019; originally announced May 2019.

    Comments: 17 pages, 3 figures, 2 appendices included

  41. Expert-Augmented Machine Learning

    Authors: E. D. Gennatas, J. H. Friedman, L. H. Ungar, R. Pirracchio, E. Eaton, L. Reichman, Y. Interian, C. B. Simone, A. Auerbach, E. Delgado, M. J. Van der Laan, T. D. Solberg, G. Valdes

    Abstract: Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may… ▽ More

    Submitted 5 January, 2021; v1 submitted 22 March, 2019; originally announced March 2019.

  42. Transporting stochastic direct and indirect effects to new populations

    Authors: Kara E Rudolph, Jonathan Levy, Mark J van der Laan

    Abstract: Transported mediation effects may contribute to understanding how and why interventions may work differently when applied to new populations. However, we are not aware of any estimators for such effects. Thus, we propose several different estimators of transported stochastic direct and indirect effects: an inverse-probability of treatment stabilized weighted estimator, a doubly robust estimator th… ▽ More

    Submitted 8 March, 2019; originally announced March 2019.

    Journal ref: Biometrics. 2020

  43. arXiv:1901.05056  [pdf, other

    stat.ME

    A nonparametric super-efficient estimator of the average treatment effect

    Authors: David Benkeser, Weixin Cai, Mark J van der Laan

    Abstract: Doubly robust estimators of causal effects are a popular means of estimating causal effects. Such estimators combine an estimate of the conditional mean of the outcome given treatment and confounders (the so-called outcome regression) with an estimate of the conditional probability of treatment given confounders (the propensity score) to generate an estimate of the effect of interest. In addition… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  44. Complier stochastic direct effects: identification and robust estimation

    Authors: Kara E Rudolph, Oleg Sofrygin, Mark J van der Laan

    Abstract: Mediation analysis is critical to understanding the mechanisms underlying exposure-outcome relationships. In this paper, we identify the instrumental variable (IV)-direct effect of the exposure on the outcome not through the mediator, using randomization of the instrument. To our knowledge, such an estimand has not previously been considered or estimated. We propose and evaluate several estimators… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Journal ref: Journal of the American Statistical Association. 2020

  45. arXiv:1810.03030  [pdf, other

    math.ST stat.ME

    Robust variance estimation and inference for causal effect estimation

    Authors: Linh Tran, Maya Petersen, Joshua Schwab, Mark J van der Laan

    Abstract: We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: 20 pages, 8 figures

  46. arXiv:1809.00734  [pdf, other

    math.ST cs.LG stat.AP stat.ME stat.ML

    Robust Estimation of Data-Dependent Causal Effects based on Observing a Single Time-Series

    Authors: Mark J. van der Laan, Ivana Malenica

    Abstract: Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concer… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

  47. arXiv:1808.03231  [pdf, other

    stat.AP

    Statistical Analysis Plan for SEARCH Phase I: Health Outcomes among Adults

    Authors: Laura B. Balzer, Diane V. Havlir, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

    Abstract: This document provides the analytic plan for evaluating adult HIV incidence, health, and implementation outcomes for the first phase of the SEARCH Study. Locked: November 27, 2017. Embargoed until July 25, 2018.

    Submitted 25 July, 2018; originally announced August 2018.

    Comments: 40 pgs

  48. arXiv:1806.06784  [pdf, other

    stat.ME stat.CO stat.ML

    Robust inference on the average treatment effect using the outcome highly adaptive lasso

    Authors: Cheng Ju, David Benkeser, Mark J. van der Laan

    Abstract: Many estimators of the average effect of a treatment on an outcome require estimation of the propensity score, the outcome regression, or both. It is often beneficial to utilize flexible techniques such as semiparametric regression or machine learning to estimate these quantities. However, optimal estimation of these regressions does not necessarily lead to optimal estimation of the average treatm… ▽ More

    Submitted 12 May, 2019; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: The first two authors contributed equally to this work

  49. arXiv:1804.00102  [pdf, other

    stat.ME math.ST stat.ML

    Collaborative targeted inference from continuously indexed nuisance parameter estimators

    Authors: Cheng Ju, Antoine Chambaz, Mark J. van der Laan

    Abstract: We wish to infer the value of a parameter at a law from which we sample independent observations. The parameter is smooth and we can define two variation-independent features of the law, its $Q$- and $G$-components, such that estimating them consistently at a fast enough product of rates allows to build a confidence interval (CI) with a given asymptotic level from a plain targeted minimum loss est… ▽ More

    Submitted 5 April, 2018; v1 submitted 30 March, 2018; originally announced April 2018.

    Comments: 38 pages

  50. arXiv:1802.09642  [pdf

    stat.ME

    Selecting optimal subgroups for treatment using many covariates

    Authors: Tyler J. VanderWeele, Alex R. Luedtke, Mark J. van der Laan, Ronald C. Kessler

    Abstract: We consider the problem of selecting the optimal subgroup to treat when data on covariates is available from a randomized trial or observational study. We distinguish between four different settings including (i) treatment selection when resources are constrained, (ii) treatment selection when resources are not constrained, (iii) treatment selection in the presence of side effects and costs, and (… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.