Search | arXiv e-print repository

Debiased machine learning for counterfactual survival functionals based on left-truncated right-censored data

Authors: Eric R. Morenz, Charles J. Wolock, Marco Carone

Abstract: Learning causal effects of a binary exposure on time-to-event endpoints can be challenging because survival times may be partially observed due to censoring and systematically biased due to truncation. In this work, we present debiased machine learning-based nonparametric estimators of the joint distribution of a counterfactual survival time and baseline covariates for use when the observed data a… ▽ More Learning causal effects of a binary exposure on time-to-event endpoints can be challenging because survival times may be partially observed due to censoring and systematically biased due to truncation. In this work, we present debiased machine learning-based nonparametric estimators of the joint distribution of a counterfactual survival time and baseline covariates for use when the observed data are subject to covariate-dependent left truncation and right censoring and when baseline covariates suffice to deconfound the relationship between exposure and survival time. Our inferential procedures explicitly allow the integration of flexible machine learning tools for nuisance estimation, and enjoy certain robustness properties. The approach we propose can be directly used to make pointwise or uniform inference on smooth summaries of the joint counterfactual survival time and covariate distribution, and can be valuable even in the absence of interventions, when summaries of a marginal survival distribution are of interest. We showcase how our procedures can be used to learn a variety of inferential targets and illustrate their performance in simulation studies. △ Less

Submitted 13 November, 2024; originally announced November 2024.

Comments: The first two authors contributed equally to this work. 61 pages (36 main text, 25 supplement). 6 figures (6 main text, 0 supplement)

arXiv:2411.02771 [pdf, ps, other]

Doubly robust inference via calibration

Authors: Lars van der Laan, Alex Luedtke, Marco Carone

Abstract: Doubly robust estimators are widely used for estimating average treatment effects and other linear summaries of regression functions. While consistency requires only one of two nuisance functions to be estimated consistently, asymptotic normality typically require sufficiently fast convergence of both. In this work, we correct this mismatch: we show that calibrating the nuisance estimators within… ▽ More Doubly robust estimators are widely used for estimating average treatment effects and other linear summaries of regression functions. While consistency requires only one of two nuisance functions to be estimated consistently, asymptotic normality typically require sufficiently fast convergence of both. In this work, we correct this mismatch: we show that calibrating the nuisance estimators within a doubly robust procedure yields doubly robust asymptotic normality for linear functionals. We introduce a general framework, calibrated debiased machine learning (calibrated DML), and propose a specific estimator that augments standard DML with a simple isotonic regression adjustment. Our theoretical analysis shows that the calibrated DML estimator remains asymptotically normal if either the regression or the Riesz representer of the functional is estimated sufficiently well, allowing the other to converge arbitrarily slowly or even inconsistently. We further propose a simple bootstrap method for constructing confidence intervals, enabling doubly robust inference without additional nuisance estimation. In a range of semi-synthetic benchmark datasets, calibrated DML reduces bias and improves coverage relative to standard DML. Our method can be integrated into existing DML pipelines by adding just a few lines of code to calibrate cross-fitted estimates via isotonic regression. △ Less

Submitted 27 June, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

arXiv:2409.19230 [pdf, other]

Propensity Score Augmentation in Matching-based Estimation of Causal Effects

Authors: Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke

Abstract: When assessing the causal effect of a binary exposure using observational data, confounder imbalance across exposure arms must be addressed. Matching methods, including propensity score-based matching, can be used to deconfound the causal relationship of interest. They have been particularly popular in practice, at least in part due to their simplicity and interpretability. However, these methods… ▽ More When assessing the causal effect of a binary exposure using observational data, confounder imbalance across exposure arms must be addressed. Matching methods, including propensity score-based matching, can be used to deconfound the causal relationship of interest. They have been particularly popular in practice, at least in part due to their simplicity and interpretability. However, these methods can suffer from low statistical efficiency compared to many competing methods. In this work, we propose a novel matching-based estimator of the average treatment effect based on a suitably-augmented propensity score model. Our procedure can be shown to have greater statistical efficiency than traditional matching estimators whenever prognostic variables are available, and in some cases, can nearly reach the nonparametric efficiency bound. In addition to a theoretical study, we provide numerical results to illustrate our findings. Finally, we use our novel procedure to estimate the effect of circumcision on risk of HIV-1 infection using vaccine efficacy trial data. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.09973 [pdf, other]

Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data

Authors: Ellen Graham, Marco Carone, Andrea Rotnitzky

Abstract: We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint tar… ▽ More We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint target distribution. While this theory proves effective in many significant contexts, it falls short in certain common data fusion problems, such as two-sample instrumental variable analysis, settings that integrate data from epidemiological studies with diverse designs (e.g., prospective cohorts and retrospective case-control studies), and studies with variables prone to measurement error that are supplemented by validation studies. In this paper, we extend the aforementioned comprehensive theory to allow for the fusion of individual-level data from sources aligned with conditional distributions that do not correspond to a single factorization of the target distribution. Assuming conditional and marginal distribution alignments, we provide universal results that characterize the class of all influence functions of regular asymptotically linear estimators and the efficient influence function of any pathwise differentiable parameter, irrespective of the number of data sources, the specific parameter of interest, or the statistical model for the target distribution. This theory paves the way for machine-learning debiased, semiparametric efficient estimation. △ Less

Submitted 24 February, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

Comments: 122 pages. Updated to simplify notation and include a supplemental section discussing the relationship between this work and arXiv:2111.14945

arXiv:2307.12544 [pdf, other]

Adaptive debiased machine learning using data-driven model selection techniques

Authors: Lars van der Laan, Marco Carone, Alex Luedtke, Mark van der Laan

Abstract: Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecif… ▽ More Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecification. To address this problem, we propose Adaptive Debiased Machine Learning (ADML), a nonparametric framework that combines data-driven model selection and debiased machine learning techniques to construct asymptotically linear, adaptive, and superefficient estimators for pathwise differentiable functionals. By learning model structure directly from data, ADML avoids the bias introduced by model misspecification and remains free from the restrictions of parametric and semiparametric models. While they may exhibit irregular behavior for the target parameter in a nonparametric statistical model, we demonstrate that ADML estimators provides regular and locally uniformly valid inference for a projection-based oracle parameter. Importantly, this oracle parameter agrees with the original target parameter for distributions within an unknown but correctly specified oracle statistical submodel that is learned from the data. This finding implies that there is no penalty, in a local asymptotic sense, for conducting data-driven model selection compared to having prior knowledge of the oracle submodel and oracle parameter. To demonstrate the practical applicability of our theory, we provide a broad class of ADML estimators for estimating the average treatment effect in adaptive partially linear regression models. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 32 pages + appendix

arXiv:2004.03683 [pdf, other]

A general framework for inference on algorithm-agnostic variable importance

Authors: Brian D. Williamson, Peter B. Gilbert, Noah R. Simon, Marco Carone

Abstract: In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment… ▽ More In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection. △ Less

Submitted 13 September, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: 69 total pages (35 in the main document, 34 supplementary), 23 figures (4 in the main document, 19 supplementary)

arXiv:2003.01856 [pdf, other]

Universal sieve-based strategies for efficient estimation using machine learning tools

Authors: Hongxiang Qiu, Alex Luedtke, Marco Carone

Abstract: Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis… ▽ More Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis for inference. Though there are several existing methods to construct asymptotically efficient plug-in estimators, each such method either can only be derived using knowledge of efficiency theory or is only valid under stringent smoothness assumptions. Among existing methods, sieve estimators stand out as particularly convenient because efficiency theory is not required in their construction, their tuning parameters can be selected data adaptively, and they are universal in the sense that the same fits lead to efficient plug-in estimators for a rich class of estimands. Inspired by these desirable properties, we propose two novel universal approaches for estimating function-valued features that can be analyzed using sieve estimation theory. Compared to traditional sieve estimators, these approaches are valid under more general conditions on the smoothness of the function-valued features by utilizing flexible estimates that can be obtained, for example, using machine learning. △ Less

Submitted 26 August, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: 46 pages, 6 figures, submitted to Bernoulli

arXiv:1810.09022 [pdf, other]

Correcting an estimator of a multivariate monotone function with isotonic regression

Authors: Ted Westling, Mark van der Laan, Marco Carone

Abstract: In many problems, a sensible estimator of a possibly multivariate monotone function may itself fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain. We demonstrate that this corrected estimator has no worse supremal estimation error than the initial estimator, and that analogously corrected c… ▽ More In many problems, a sensible estimator of a possibly multivariate monotone function may itself fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain. We demonstrate that this corrected estimator has no worse supremal estimation error than the initial estimator, and that analogously corrected confidence bands contain the true function whenever the initial bands do, at no loss to average or maximal band width. Additionally, we demonstrate that the corrected estimator is uniformly asymptotically equivalent to the initial estimator provided that the initial estimator satisfies a stochastic equicontinuity condition and that the true function is Lipschitz and strictly monotone. We provide simple sufficient conditions for our stochastic equicontinuity condition in the important special case that the initial estimator is uniformly asymptotically linear, and illustrate the use of these results for estimation of a G-computed distribution function. Our stochastic equicontinuity condition is weaker than standard uniform stochastic equicontinuity, which has been required for alternative correction procedures. Crucially, this allows us to apply our results to the bivariate correction of the local linear estimator of a conditional distribution function known to be monotone in its conditioning argument. Our experiments suggest that the projection step can yield significant practical improvements in performance for both the estimator and confidence band. △ Less

Submitted 4 September, 2019; v1 submitted 21 October, 2018; originally announced October 2018.

arXiv:1806.01928 [pdf, other]

A unified study of nonparametric inference for monotone functions

Authors: Ted Westling, Marco Carone

Abstract: The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise… ▽ More The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise convergence in distribution of a class of generalized Grenander-type estimators of a monotone function. This broad class allows the minorization or majoratization operation to be performed on a data-dependent transformation of the domain, possibly yielding benefits in practice. Additionally, we provide simpler conditions and more concrete distributional theory in the important case that the primitive estimator and data-dependent transformation function are asymptotically linear. We use our general results in the context of various well-studied problems, and show that we readily recover classical results established separately in each case. More importantly, we show that our results allow us to tackle more challenging problems involving parameters for which the use of flexible learning strategies appears necessary. In particular, we study inference on monotone density and hazard functions using informatively right-censored data, extending the classical work on independent censoring, and on a covariate-marginalized conditional mean function, extending the classical work on monotone regression functions. In addition to a theoretical study, we present numerical evidence supporting our large-sample results. △ Less

Submitted 29 November, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

Comments: Substantial revisions made to the manuscript

arXiv:1608.08717 [pdf, other]

Toward computerized efficient estimation in infinite-dimensional models

Authors: Marco Carone, Alexander R. Luedtke, Mark J. van der Laan

Abstract: Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scie… ▽ More Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scientific knowledge more appropriately, performing efficient inference in these models is generally challenging. The efficient influence function is a key analytic object from which the construction of asymptotically efficient estimators can potentially be streamlined. However, the theoretical derivation of the efficient influence function requires specialized knowledge and is often a difficult task, even for experts. In this paper, we propose and discuss a numerical procedure for approximating the efficient influence function. The approach generalizes the simple nonparametric procedures described recently by Frangakis et al. (2015) and Luedtke et al. (2015) to arbitrary models. We present theoretical results to support our proposal, and also illustrate the method in the context of two examples. The proposed approach is an important step toward automating efficient estimation in general statistical models, thereby rendering the use of realistic models in statistical analyses much more accessible. △ Less

Submitted 30 August, 2016; originally announced August 2016.

arXiv:1511.08369 [pdf, other]

Second-Order Inference for the Mean of a Variable Missing at Random

Authors: Iván Díaz, Marco Carone, Mark J. van der Laan

Abstract: We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates neces… ▽ More We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator. △ Less

Submitted 26 November, 2015; originally announced November 2015.

arXiv:1510.04195 [pdf, other]

An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions

Authors: Alexander R. Luedtke, Marco Carone, Mark J. van der Laan

Abstract: We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability liter… ▽ More We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests can be applied and show that they perform well in a simulation study. As an important special case, our proposed tests can be used to determine whether an unknown function, such as the conditional average treatment effect, is equal to zero almost surely. △ Less

Submitted 13 June, 2017; v1 submitted 14 October, 2015; originally announced October 2015.

MSC Class: 62G10

arXiv:1205.6275 [pdf, ps, other]

doi 10.1214/11-AOS954

Large-sample study of the kernel density estimators under multiplicative censoring

Authors: Masoud Asgharian, Marco Carone, Vahid Fakoor

Abstract: The multiplicative censoring model introduced in Vardi [Biometrika 76 (1989) 751--761] is an incomplete data problem whereby two independent samples from the lifetime distribution $G$, $\mathcal{X}_m=(X_1,...,X_m)$ and $\mathcal{Z}_n=(Z_1,...,Z_n)$, are observed subject to a form of coarsening. Specifically, sample $\mathcal{X}_m$ is fully observed while $\mathcal{Y}_n=(Y_1,...,Y_n)$ is observed i… ▽ More The multiplicative censoring model introduced in Vardi [Biometrika 76 (1989) 751--761] is an incomplete data problem whereby two independent samples from the lifetime distribution $G$, $\mathcal{X}_m=(X_1,...,X_m)$ and $\mathcal{Z}_n=(Z_1,...,Z_n)$, are observed subject to a form of coarsening. Specifically, sample $\mathcal{X}_m$ is fully observed while $\mathcal{Y}_n=(Y_1,...,Y_n)$ is observed instead of $\mathcal{Z}_n$, where $Y_i=U_iZ_i$ and $(U_1,...,U_n)$ is an independent sample from the standard uniform distribution. Vardi [Biometrika 76 (1989) 751--761] showed that this model unifies several important statistical problems, such as the deconvolution of an exponential random variable, estimation under a decreasing density constraint and an estimation problem in renewal processes. In this paper, we establish the large-sample properties of kernel density estimators under the multiplicative censoring model. We first construct a strong approximation for the process $\sqrt{k}(\hat{G}-G)$, where $\hat{G}$ is a solution of the nonparametric score equation based on $(\mathcal{X}_m,\mathcal{Y}_n)$, and $k=m+n$ is the total sample size. Using this strong approximation and a result on the global modulus of continuity, we establish conditions for the strong uniform consistency of kernel density estimators. We also make use of this strong approximation to study the weak convergence and integrated squared error properties of these estimators. We conclude by extending our results to the setting of length-biased sampling. △ Less

Submitted 29 May, 2012; originally announced May 2012.

Comments: Published in at http://dx.doi.org/10.1214/11-AOS954 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS954

Journal ref: Annals of Statistics 2012, Vol. 40, No. 1, 159-187

Showing 1–13 of 13 results for author: Carone, M