Skip to main content

Showing 1–29 of 29 results for author: Carone, M

.
  1. arXiv:2411.09017  [pdf, other

    stat.ME math.ST

    Debiased machine learning for counterfactual survival functionals based on left-truncated right-censored data

    Authors: Eric R. Morenz, Charles J. Wolock, Marco Carone

    Abstract: Learning causal effects of a binary exposure on time-to-event endpoints can be challenging because survival times may be partially observed due to censoring and systematically biased due to truncation. In this work, we present debiased machine learning-based nonparametric estimators of the joint distribution of a counterfactual survival time and baseline covariates for use when the observed data a… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: The first two authors contributed equally to this work. 61 pages (36 main text, 25 supplement). 6 figures (6 main text, 0 supplement)

  2. arXiv:2411.06342  [pdf, ps, other

    stat.ME stat.ML

    Stabilized Inverse Probability Weighting via Isotonic Calibration

    Authors: Lars van der Laan, Ziming Lin, Marco Carone, Alex Luedtke

    Abstract: Inverse weighting with an estimated propensity score is widely used by estimation methods in causal inference to adjust for confounding bias. However, directly inverting propensity score estimates can lead to instability, bias, and excessive variability due to large inverse weights, especially when treatment overlap is limited. In this work, we propose a post-hoc calibration algorithm for inverse… ▽ More

    Submitted 9 April, 2025; v1 submitted 9 November, 2024; originally announced November 2024.

    Comments: Accepted to CLeaR conference (2025). Companion paper: Automatic doubly robust inference for linear functionals via calibrated debiased machine learning, arXiv:2411.02771

  3. arXiv:2411.02771  [pdf, other

    stat.ME math.ST stat.ML

    Automatic doubly robust inference for linear functionals via calibrated debiased machine learning

    Authors: Lars van der Laan, Alex Luedtke, Marco Carone

    Abstract: In causal inference, many estimands of interest can be expressed as a linear functional of the outcome regression function; this includes, for example, average causal effects of static, dynamic and stochastic interventions. For learning such estimands, in this work, we propose novel debiased machine learning estimators that are doubly robust asymptotically linear, thus providing not only doubly ro… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

  4. arXiv:2409.19230  [pdf, other

    stat.ME math.ST

    Propensity Score Augmentation in Matching-based Estimation of Causal Effects

    Authors: Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke

    Abstract: When assessing the causal effect of a binary exposure using observational data, confounder imbalance across exposure arms must be addressed. Matching methods, including propensity score-based matching, can be used to deconfound the causal relationship of interest. They have been particularly popular in practice, at least in part due to their simplicity and interpretability. However, these methods… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

  5. arXiv:2409.09973  [pdf, other

    math.ST stat.ME stat.ML

    Towards a Unified Theory for Semiparametric Data Fusion with Individual-Level Data

    Authors: Ellen Graham, Marco Carone, Andrea Rotnitzky

    Abstract: We address the goal of conducting inference about a smooth finite-dimensional parameter by utilizing individual-level data from various independent sources. Recent advancements have led to the development of a comprehensive theory capable of handling scenarios where different data sources align with, possibly distinct subsets of, conditional distributions of a single factorization of the joint tar… ▽ More

    Submitted 24 February, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: 122 pages. Updated to simplify notation and include a supplemental section discussing the relationship between this work and arXiv:2111.14945

  6. Investigating symptom duration using current status data: a case study of post-acute COVID-19 syndrome

    Authors: Charles J. Wolock, Susan Jacob, Julia C. Bennett, Anna Elias-Warren, Jessica O'Hanlon, Avi Kenny, Nicholas P. Jewell, Andrea Rotnitzky, Stephen R. Cole, Ana A. Weil, Helen Y. Chu, Marco Carone

    Abstract: For infectious diseases, characterizing symptom duration is of clinical and public health importance. Symptom duration may be assessed by surveying infected individuals and querying symptom status at the time of survey response. For example, in a SARS-CoV-2 testing program at the University of Washington, participants were surveyed at least $28$ days after testing positive and asked to report curr… ▽ More

    Submitted 17 March, 2025; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally to this work. Main text: 22 pages, 2 figure, 4 tables. Supplement: 23 pages, 14 figures, 0 tables. This update (v3) includes sensitivity analysis methodology and results

  7. arXiv:2402.01972  [pdf, other

    stat.ML cs.LG stat.ME

    Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

    Authors: Lars van der Laan, Marco Carone, Alex Luedtke

    Abstract: We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same oracle-efficiency as Neyman-orthogonal learning strategies, such as DR-learning and R-learning, while addressing some of their primary drawbacks, including that… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  8. Assessing variable importance in survival analysis using machine learning

    Authors: Charles J. Wolock, Peter B. Gilbert, Noah Simon, Marco Carone

    Abstract: Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. For example, in HIV vaccine trials, participant baseline characteristics are used to predict the probability of HIV acquisition over the intended follow-up period, and investigators may wish to understand ho… ▽ More

    Submitted 12 August, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 98 total pages (37 main text, 61 supplementary)

    Journal ref: Biometrika 112(2) (2025)

  9. arXiv:2307.12544  [pdf, other

    stat.ME math.ST stat.ML

    Adaptive debiased machine learning using data-driven model selection techniques

    Authors: Lars van der Laan, Marco Carone, Alex Luedtke, Mark van der Laan

    Abstract: Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecif… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 32 pages + appendix

  10. arXiv:2302.14011  [pdf, other

    stat.ML cs.LG stat.ME

    Causal isotonic calibration for heterogeneous treatment effects

    Authors: Lars van der Laan, Ernesto Ulloa-Pérez, Marco Carone, Alex Luedtke

    Abstract: We propose causal isotonic calibration, a novel nonparametric method for calibrating predictors of heterogeneous treatment effects. Furthermore, we introduce cross-calibration, a data-efficient variant of calibration that eliminates the need for hold-out calibration sets. Cross-calibration leverages cross-fitted predictors and generates a single calibrated predictor using all available data. Under… ▽ More

    Submitted 5 June, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted to ICML2023

  11. A framework for leveraging machine learning tools to estimate personalized survival curves

    Authors: Charles J. Wolock, Peter B. Gilbert, Noah Simon, Marco Carone

    Abstract: The conditional survival function of a time-to-event outcome subject to censoring and truncation is a common target of estimation in survival analysis. This parameter may be of scientific interest and also often appears as a nuisance in nonparametric and semiparametric problems. In addition to classical parametric and semiparametric methods (e.g., based on the Cox proportional hazards model), flex… ▽ More

    Submitted 31 October, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: 52 pages, 13 figures

    Journal ref: Journal of Computational and Graphical Statistics 33(3) 1098-1108 (2024)

  12. arXiv:2211.00163  [pdf, other

    stat.ME stat.AP

    Can the potential benefit of individualizing treatment be assessed using trial summary statistics alone?

    Authors: Nina Galanter, Marco Carone, Ronald C. Kessler, Alex Luedtke

    Abstract: Individualizing treatment assignment can improve outcomes for diseases with patient-to-patient variability in comparative treatment effects. When a clinical trial demonstrates that some patients improve on treatment while others do not, it is tempting to assume that treatment effect heterogeneity exists. However, if variability in response is mainly driven by factors other than treatment, investig… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: 31 pages, 1 figure

  13. arXiv:2203.01897  [pdf, other

    stat.ME

    A general adaptive framework for multivariate point null testing

    Authors: Adam Elder, Marco Carone, Peter Gilbert, Alex Luedtke

    Abstract: As a common step in refining their scientific inquiry, investigators are often interested in performing some screening of a collection of given statistical hypotheses. For example, they may wish to determine whether any one of several patient characteristics are associated with a health outcome of interest. Existing generic methods for testing a multivariate hypothesis -- such as multiplicity corr… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

  14. arXiv:2201.06669  [pdf, other

    stat.ME

    Individualized treatment rules under stochastic treatment cost constraints

    Authors: Hongxiang Qiu, Marco Carone, Alex Luedtke

    Abstract: Estimation and evaluation of individualized treatment rules have been studied extensively, but real-world treatment resource constraints have received limited attention in existing methods. We investigate a setting in which treatment is intervened upon based on covariates to optimize the mean counterfactual outcome under treatment cost constraints when the treatment cost is random. In a particular… ▽ More

    Submitted 22 November, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

  15. arXiv:2107.05734  [pdf, other

    stat.ME

    Assessment of Immune Correlates of Protection via Controlled Vaccine Efficacy and Controlled Risk

    Authors: Peter B. Gilbert, Youyi Fong, Marco Carone

    Abstract: Immune correlates of protection (CoPs) are immunologic biomarkers accepted as a surrogate for an infectious disease clinical endpoint and thus can be used for traditional or provisional vaccine approval. To study CoPs in randomized, placebo-controlled trials, correlates of risk (CoRs) are first assessed in vaccine recipients. This analysis does not assess causation, as a CoR may fail to be a CoP.… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: 25 pages, 3 figures, 1 table

  16. arXiv:2106.06602  [pdf, other

    stat.ME

    Inference for treatment-specific survival curves using machine learning

    Authors: Ted Westling, Alex Luedtke, Peter Gilbert, Marco Carone

    Abstract: In the absence of data from a randomized trial, researchers often aim to use observational data to draw causal inference about the effect of a treatment on a time-to-event outcome. In this context, interest often focuses on the treatment-specific survival curves; that is, the survival curves were the entire population under study to be assigned to receive the treatment or not. Under certain causal… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  17. arXiv:2105.06646  [pdf, other

    stat.ME

    Inference on function-valued parameters using a restricted score test

    Authors: Aaron Hudson, Marco Carone, Ali Shojaie

    Abstract: It is often of interest to make inference on an unknown function that is a local parameter of the data-generating mechanism, such as a density or regression function. Such estimands can typically only be estimated at a slower-than-parametric rate in nonparametric and semiparametric models, and performing calibrated inference can be challenging. In many cases, these estimands can be expressed as th… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: 39 pages, 7 figures

  18. arXiv:2004.03683  [pdf, other

    stat.ME math.ST stat.ML

    A general framework for inference on algorithm-agnostic variable importance

    Authors: Brian D. Williamson, Peter B. Gilbert, Noah R. Simon, Marco Carone

    Abstract: In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment… ▽ More

    Submitted 13 September, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: 69 total pages (35 in the main document, 34 supplementary), 23 figures (4 in the main document, 19 supplementary)

  19. arXiv:2003.01856  [pdf, other

    stat.ME math.ST

    Universal sieve-based strategies for efficient estimation using machine learning tools

    Authors: Hongxiang Qiu, Alex Luedtke, Marco Carone

    Abstract: Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis… ▽ More

    Submitted 26 August, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: 46 pages, 6 figures, submitted to Bernoulli

  20. arXiv:1910.02087  [pdf, other

    stat.ME

    Combining Biomarkers by Maximizing the True Positive Rate for a Fixed False Positive Rate

    Authors: Allison Meisner, Marco Carone, Margaret S. Pepe, Kathleen F. Kerr

    Abstract: Biomarkers abound in many areas of clinical research, and often investigators are interested in combining them for diagnosis, prognosis, or screening. In many applications, the true positive rate for a biomarker combination at a prespecified, clinically acceptable false positive rate is the most relevant measure of predictive capacity. We propose a distribution-free method for constructing biomark… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: 37 pages (including appendices)

  21. arXiv:1810.09022  [pdf, other

    math.ST stat.ME

    Correcting an estimator of a multivariate monotone function with isotonic regression

    Authors: Ted Westling, Mark van der Laan, Marco Carone

    Abstract: In many problems, a sensible estimator of a possibly multivariate monotone function may itself fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain. We demonstrate that this corrected estimator has no worse supremal estimation error than the initial estimator, and that analogously corrected c… ▽ More

    Submitted 4 September, 2019; v1 submitted 21 October, 2018; originally announced October 2018.

  22. arXiv:1810.03269  [pdf, other

    stat.ME

    Causal isotonic regression

    Authors: Ted Westling, Peter Gilbert, Marco Carone

    Abstract: In observational studies, potential confounders may distort the causal relationship between an exposure and an outcome. However, under some conditions, a causal dose-response curve can be recovered using the G-computation formula. Most classical methods for estimating such curves when the exposure is continuous rely on restrictive parametric assumptions, which carry significant risk of model missp… ▽ More

    Submitted 16 December, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

  23. arXiv:1806.01928  [pdf, other

    math.ST

    A unified study of nonparametric inference for monotone functions

    Authors: Ted Westling, Marco Carone

    Abstract: The problem of nonparametric inference on a monotone function has been extensively studied in many particular cases. Estimators considered have often been of so-called Grenander type, being representable as the left derivative of the greatest convex minorant or least concave majorant of an estimator of a primitive function. In this paper, we provide general conditions for consistency and pointwise… ▽ More

    Submitted 29 November, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

    Comments: Substantial revisions made to the manuscript

  24. On-Demand Virtual Research Environments using Microservices

    Authors: Marco Capuccini, Anders Larsson, Matteo Carone, Jon Ander Novella, Noureddin Sadawi, Jianliang Gao, Salman Toor, Ola Spjuth

    Abstract: The computational demands for scientific applications are continuously increasing. The emergence of cloud computing has enabled on-demand resource allocation. However, relying solely on infrastructure as a service does not achieve the degree of flexibility required by the scientific community. Here we present a microservice-oriented methodology, where scientific applications run in a distributed o… ▽ More

    Submitted 10 May, 2019; v1 submitted 16 May, 2018; originally announced May 2018.

  25. arXiv:1705.02459  [pdf, other

    stat.ME

    Sequential Double Robustness in Right-Censored Longitudinal Models

    Authors: Alexander R. Luedtke, Oleg Sofrygin, Mark J. van der Laan, Marco Carone

    Abstract: Consider estimating the G-formula for the counterfactual mean outcome under a given treatment regime in a longitudinal study. Bang and Robins provided an estimator for this quantity that relies on a sequential regression formulation of this parameter. This approach is doubly robust in that it is consistent if either the outcome regressions or the treatment mechanisms are consistently estimated. We… ▽ More

    Submitted 16 May, 2018; v1 submitted 6 May, 2017; originally announced May 2017.

    Comments: Version 2 Version 1: May 6, 2017

  26. arXiv:1608.08717  [pdf, other

    math.ST

    Toward computerized efficient estimation in infinite-dimensional models

    Authors: Marco Carone, Alexander R. Luedtke, Mark J. van der Laan

    Abstract: Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scie… ▽ More

    Submitted 30 August, 2016; originally announced August 2016.

  27. arXiv:1511.08369  [pdf, other

    math.ST

    Second-Order Inference for the Mean of a Variable Missing at Random

    Authors: Iván Díaz, Marco Carone, Mark J. van der Laan

    Abstract: We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates neces… ▽ More

    Submitted 26 November, 2015; originally announced November 2015.

  28. arXiv:1510.04195  [pdf, other

    math.ST stat.ML

    An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions

    Authors: Alexander R. Luedtke, Marco Carone, Mark J. van der Laan

    Abstract: We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability liter… ▽ More

    Submitted 13 June, 2017; v1 submitted 14 October, 2015; originally announced October 2015.

    MSC Class: 62G10

  29. Large-sample study of the kernel density estimators under multiplicative censoring

    Authors: Masoud Asgharian, Marco Carone, Vahid Fakoor

    Abstract: The multiplicative censoring model introduced in Vardi [Biometrika 76 (1989) 751--761] is an incomplete data problem whereby two independent samples from the lifetime distribution $G$, $\mathcal{X}_m=(X_1,...,X_m)$ and $\mathcal{Z}_n=(Z_1,...,Z_n)$, are observed subject to a form of coarsening. Specifically, sample $\mathcal{X}_m$ is fully observed while $\mathcal{Y}_n=(Y_1,...,Y_n)$ is observed i… ▽ More

    Submitted 29 May, 2012; originally announced May 2012.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOS954 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS954

    Journal ref: Annals of Statistics 2012, Vol. 40, No. 1, 159-187