-
Cheap Subsampling bootstrap confidence intervals for fast and robust inference
Authors:
Johan Sebastian Ohlendorff,
Anders Munch,
Kathrine Kold Sørensen,
Thomas Alexander Gerds
Abstract:
Bootstrapping is often applied to get confidence limits for semiparametric inference of a target parameter in the presence of nuisance parameters. Bootstrapping with replacement can be computationally expensive and problematic when cross-validation is used in the estimation algorithm due to duplicate observations in the bootstrap samples. We provide a valid, fast, easy-to-implement subsampling boo…
▽ More
Bootstrapping is often applied to get confidence limits for semiparametric inference of a target parameter in the presence of nuisance parameters. Bootstrapping with replacement can be computationally expensive and problematic when cross-validation is used in the estimation algorithm due to duplicate observations in the bootstrap samples. We provide a valid, fast, easy-to-implement subsampling bootstrap method for constructing confidence intervals for asymptotically linear estimators and discuss its application to semiparametric causal inference. Our method, inspired by the Cheap Bootstrap (Lam, 2022), leverages the quantiles of a t-distribution and has the desired coverage with few bootstrap replications. We show that the method is asymptotically valid if the subsample size is chosen appropriately as a function of the sample size. We illustrate our method with data from the LEADER trial (Marso et al., 2016), obtaining confidence intervals for a longitudinal targeted minimum loss-based estimator (van der Laan and Gruber, 2012). Through a series of empirical experiments, we also explore the impact of subsample size, sample size, and the number of bootstrap repetitions on the performance of the confidence interval.
△ Less
Submitted 5 March, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
The state learner -- a super learner for right-censored data
Authors:
Anders Munch,
Thomas A. Gerds
Abstract:
In survival analysis, prediction models are needed as stand-alone tools and in applications of causal inference to estimate nuisance parameters. The super learner is a machine learning algorithm which combines a library of prediction models into a meta learner based on cross-validated loss. In right-censored data, the choice of the loss function and the estimation of the expected loss need careful…
▽ More
In survival analysis, prediction models are needed as stand-alone tools and in applications of causal inference to estimate nuisance parameters. The super learner is a machine learning algorithm which combines a library of prediction models into a meta learner based on cross-validated loss. In right-censored data, the choice of the loss function and the estimation of the expected loss need careful consideration. We introduce the state learner, a new super learner for survival analysis, which simultaneously evaluates libraries of prediction models for the event of interest and the censoring distribution. The state learner can be applied to all types of survival models, works in the presence of competing risks, and does not require a single pre-specified estimator of the conditional censoring distribution. We establish an oracle inequality for the state learner and investigate its performance through numerical experiments. We illustrate the application of the state learner with prostate cancer data, as a stand-alone prediction tool, and, for causal inference, as a way to estimate the nuisance parameter models of a smooth statistical functional.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Fish should not be in isolation: Calculating maximum sustainable yield using an ensemble model
Authors:
Michael A. Spence,
Khatija Alliji,
Hayley J. Bannister,
Nicola D. Walker,
Angela Muench
Abstract:
Many jurisdictions have a legal requirement to manage fish stocks to maximum sustainable yield (MSY). Generally, MSY is calculated on a single-species basis, however in reality, the yield of one species depends, not only on its own fishing level, but that of other species. We show that bold assumptions about the effect of interacting species on MSY are made when managing on a single-species basis,…
▽ More
Many jurisdictions have a legal requirement to manage fish stocks to maximum sustainable yield (MSY). Generally, MSY is calculated on a single-species basis, however in reality, the yield of one species depends, not only on its own fishing level, but that of other species. We show that bold assumptions about the effect of interacting species on MSY are made when managing on a single-species basis, often leading to inconsistent and conflicting advice, demonstrating the requirement of a multispecies MSY (MMSY). Although there are several definitions of MMSY, there is no consensus. Furthermore, calculating a MMSY can be difficult as there are many models, of varying complexity, each with their own strengths and weaknesses, and the value if MMSY can be sensitive to the model used. Here, we use an ensemble model to combine different multispecies models, exploiting their individual strengths and quantifying their uncertainties and discrepancies, to calculate a more robust MMSY. We demonstrate this by calculating a MMSY for nine species in the North Sea. We found that it would be impossible to fish at single-species MSY and that MMSY led to higher yields and revenues than current levels.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.